You are a life saver my friend. I was not aware that a data task can be composed of so many other tasks. Usually we just see Flat File Source along with OLE DB Destination but this was more advanced. Thanks!
Thanks my friend. You saved my day with this video. In my case it was not Excel, but rather a SQL Query. The addition that I incorporated to my ETL is to query by Insertion Date and Update Date (2 fields that my table already had). Thank you very much and greetings from Santiago de Chile. --- Gracias amigo. Has salvado mi dìa con este video. En mis caso no se trataba de Excel, sino que de una Consulta SQL. El agregado que le incorporè a mi ETL es consultar por Fecha de Inserciòn y Fecha de Actualizaciòn (2 campos que mi tabla ya tenía). Muchas gracias y saludos desde Santiago de Chile.
Really nice, thank you so much. you gave me a good insight. Just on question: is there a way to update only the record that got an actual update? for example in your case you delete 10 rows from the db table but the other 20 are exactly the same. So, what is the point to update something that has no change?
Thanks for your comment, yeah of course there is a way to check if the data has changed or not. thus after verifying that data has changed we can update only those records. To check if data has changed, In the data flow Task just before inserting the data to be updated, we should take a Conditional Split transformation and then we can should compare the value of source and destination fields like FirstName == Dest_FirstName || LastName == Dest_LastName In above text FirstName is the column from source data and Dest_FirstName is the column from destination data. Now we can name this result set as UnChanged records and they don't need to be updated, and the default result set from the Conditional Split will be the records those got changed and thus they can be updated.
Yeah you would need to write the dynamic sql to get the column names etc and then prepare the sql create table statement from there. You can use a cursor to prepare the sql query and in the end just execute that query.
How can we tell what fields got updated? For example the last name changes in the source. How can we flag/highlight the updated field to show that the Last Name changed? I have Premium Excel Files as my output after the Conditional Split. I would like to highlight or somehow show the Last Name is the field that got updated for a particular record.
Learn SSIS thanks for the reply. Just to clarify, i understand that the fields would have to be compared. My question is how do we achieve that? Currently I have a left outer merge join like you have shown in this video. My question is how do we achieve it on a field level? Obviously we compare fields between the source and destination but how do we achieve that?
@@SharjeelButt In the Matched record output, we can use a Derived column transformation and compare one source field for example lastname with destination source field and name the output as lastname_result, the data type will be of boolean (lastname_source == lastname_dest), so if the value will be same it will have a true value returned OR if the value will not be same means there is an update then the lastname_result will contain a False value. Now May be we can insert this new column lastname_Dest to a sql table and can easily identify which records have a false value for matched output.
@@learnssis When I add a Derived Column component after my conditional split to the matched records output, the list of available columns in my Derived Columns is only from the source table. How can I call the list of columns from my destination?
How can you handle the same issue if the primary key has identity index and is being referenced by multiple foreign key.. this is the real problem in update and insert to another table in another server
Of course we can do the same using Lookup and conditional split and that is the best approach. Here is the video on that approach. th-cam.com/video/8dRIjVhKoKs/w-d-xo.html
No, we should use Lookup transformation for huge data. I just made the video to show that this is also an option but this should be used for small data set for large data set we should use Lookup transformation. I am planning to create a video on that very soon. Thanks.
I am getting the error when using the merge join "both inputs of the transformation must contain at least one sorted column, and those columns must have matching metadata." Please can you advice
Please take a look at this URL, I don't have much experience with this issue. stackoverflow.com/questions/11227397/ssis-package-could-not-open-global-shared-memory-to-communicate-with-performance
thank you so mutch Aqil Very good job on BigData file but you have any idea, how I can make this package generic and reusable with any csv file with generic mapping respecting the same conditions of this case thks for all Demo
@@learnssisplease can i generate file xml configuration in these program for update parameter connexion and name the table and namecolomun and type ?
@@lamiaaelammari5172 I have once seen a stored procedure in a company that they were using for incremental data update from source table to destination table. So first using the code from C#, you can import any csv file into a sql table table and then using a sql server stored procedure you can easily insert\update data to the destination table from your source table. You can take help from chatgpt to generate the code for you. For example below is the code to insert\update data to a single table MERGE INTO DestinationTable AS dest USING SourceTable AS src ON dest.ID = src.ID -- Assuming ID is the matching key -- When matched (record exists in destination) WHEN MATCHED THEN UPDATE SET dest.Name = src.Name, dest.LastUpdated = GETDATE()
-- When not matched (record doesn't exist in destination) WHEN NOT MATCHED THEN INSERT (ID, Name, LastUpdated) VALUES (src.ID, src.Name, GETDATE()); ---------------------------- And now below is the dynamic code to insert\update to any table. This code I got from chatgpt DECLARE @sourceTableName NVARCHAR(128) = 'SourceTable'; DECLARE @destinationTableName NVARCHAR(128) = 'DestinationTable'; DECLARE @mergeStatement NVARCHAR(MAX); DECLARE @columns NVARCHAR(MAX); -- Generate a comma-separated list of columns SELECT @columns = STRING_AGG(QUOTENAME(name), ', ') WITHIN GROUP (ORDER BY column_id) FROM sys.columns WHERE object_id = OBJECT_ID(@sourceTableName); -- Build the dynamic MERGE statement SET @mergeStatement = ' MERGE INTO ' + QUOTENAME(@destinationTableName) + ' AS dest USING ' + QUOTENAME(@sourceTableName) + ' AS src ON dest.ID = src.ID -- Assuming ID is the matching key -- When matched (record exists in destination) WHEN MATCHED THEN UPDATE SET ' + @columns + ' = src.' + @columns + ', dest.LastUpdated = GETDATE() -- When not matched (record doesn''t exist in destination) WHEN NOT MATCHED THEN INSERT (' + @columns + ', LastUpdated) VALUES (src.' + @columns + ', GETDATE());'; -- Execute the dynamic MERGE statement EXEC sp_executesql @mergeStatement;
@@learnssis Thank you very much Aqil for your answer and your suggestion, I find it relevant to me, I will follow it I just want to put the steps to follow and you confirm me, it’s okay or not first, I do a task script on SSIS, I put the same program of you to recover any file I follow the video of "78 How to import multiple csv files into sql server | Import all CSV files from folder to sql server" After I create a stored procedure that merges and updates the table this procedure store question: I have to call it through the sql execution task and parameter value passages, I pass them where exactly this is when I run a package or I make a config.xml file. I prefer to create an xml file how I can use the values of xml files to run the package , if you see that the logic is not correct please correct me if you have videos in this direction or links share with me your proposals are considerable to me
@@lamiaaelammari51721. In first step you will import the csv file using the same code that I used in C# to import the data into sql server table. 2. In second step you will import the data from xml file into a sql staging table. 3. In third step you will call the stored procedure by passing the sql table name, and maybe joining column etc. Because you have xml file stored in the database thus you can easily get the parameter values from staging table and can pass to the dynamic stored procedure so that stored procedure knows what is the source and destination tables to be updated.
Yes, it can be update. Even this can be done from one server to another server. And the best solution for differential update is using lookup th-cam.com/video/8dRIjVhKoKs/w-d-xo.html
Hello Aqil. Such a clear explanation , but I have a doubt what if we have multiple source files and we need to update and insert few records in a table in SQL server? Also if Target table data is 50M what method you suggest?
Thanks for your question Hyma. I would suggest you to use the method below which I used here using Lookup transformation th-cam.com/video/8dRIjVhKoKs/w-d-xo.html If you have multiple source files, then you can load them to an staging table into sql server and then you can use the lookup transformation to check which records are new from source, which are modified records to be updated and which records do not need to be updated.
@@hymapranusha6090 If the structure of the files is same then yes you can use foreach loop container to loop through multiple files and load them to staging table.
On Primary means, the table will be created on the Primary file group in the database. In a SQL Server database there can be multiple filegroups, like Primary and secondary. If your database is created on a sql server disk that is filled and if your database wants to expand then you can create a secondary file group as secondary and create a ndf file and assign the ndf file to secondary file group. Now when the data wil be populated to the tables inside database then the database size will increase to secondary filegroup and will have data in ndf file.
Video is very useful and thanks for it... One quick question. Why do you want to keep id column as varchar instead of int... In which case this will work... Response is much appreciated... Thanks...
good job Ahmed, Have you tired this technique with really big tables ? i'm talking 100 milions of rows? And if you know, is it faster than using lookup ? Thank you in advance and keep the good work on
No, I have not. Lookup method will be faster than this. And there is one more way Hash Method that should be faster than Lookup, I will make a video on that one as well.
@@learnssis Yes , and Lookup methode will be surely faster of forexemple took the last 30 days of data , put it in temporary table and then do the lookup on the temporary table and push the non existant data into the final tables. this methode very useful, i used it in table of 1 Billion rows and it worked just fine
This tutorial is perfect for me, but I have to redo the scavenging for each table and for each database, I have several tables, I have to redo the same thing , there is no solution to make this package generic and reusable.
Hi, thanks for your comment. The logic for the incremental load will be same whatever I have shown for text file. So If you put the excel file instead of CSV file as a source then logic will still remain the same, only the components will change from flat file source to excel source. Thanks.
@@PowerWithInYou555 You need to make a free account on www.dataplatformlabs.com and then you can buy the course its completely free. Then you should be able to download it. Thanks.
You are a life saver my friend. I was not aware that a data task can be composed of so many other tasks. Usually we just see Flat File Source along with OLE DB Destination but this was more advanced. Thanks!
You are most welcome Antwan Wimberly.
Thanks my friend. You saved my day with this video. In my case it was not Excel, but rather a SQL Query. The addition that I incorporated to my ETL is to query by Insertion Date and Update Date (2 fields that my table already had). Thank you very much and greetings from Santiago de Chile.
---
Gracias amigo. Has salvado mi dìa con este video. En mis caso no se trataba de Excel, sino que de una Consulta SQL. El agregado que le incorporè a mi ETL es consultar por Fecha de Inserciòn y Fecha de Actualizaciòn (2 campos que mi tabla ya tenía). Muchas gracias y saludos desde Santiago de Chile.
Glad it worked for you. So happy to know that it helped you in some way. Thank you.
¡Gracias!
Thank you so much for supporting Pablo.
This video saved me a lot headaches. Thank you so much.
Nice to hear from you. I would recommend you below video as well on almost same topic, but best video
th-cam.com/video/8dRIjVhKoKs/w-d-xo.html
Your work is appreciable ....thanks a lot
Thank you so much.
many thanks for these good expanation
Thank you Ahmed.
Thank you from thailand
You are most welcome.
Really nice, thank you so much. you gave me a good insight. Just on question: is there a way to update only the record that got an actual update? for example in your case you delete 10 rows from the db table but the other 20 are exactly the same. So, what is the point to update something that has no change?
Thanks for your comment, yeah of course there is a way to check if the data has changed or not. thus after verifying that data has changed we can update only those records. To check if data has changed, In the data flow Task just before inserting the data to be updated, we should take a Conditional Split transformation and then we can should compare the value of source and destination fields like
FirstName == Dest_FirstName || LastName == Dest_LastName
In above text FirstName is the column from source data and Dest_FirstName is the column from destination data. Now we can name this result set as UnChanged records and they don't need to be updated, and the default result set from the Conditional Split will be the records those got changed and thus they can be updated.
can you provide one video to find inserted updated and deleted record in separate table using hashbyte
Is there a way to create a table in the sql database with information from a table in excel that is being imported via ssis?
Yeah you would need to write the dynamic sql to get the column names etc and then prepare the sql create table statement from there.
You can use a cursor to prepare the sql query and in the end just execute that query.
@@learnssis How would this dynamic SQL? Do you have any video explaining?
@@rpais1979 Sorry I don't have any video on this.
How can we tell what fields got updated? For example the last name changes in the source. How can we flag/highlight the updated field to show that the Last Name changed? I have Premium Excel Files as my output after the Conditional Split. I would like to highlight or somehow show the Last Name is the field that got updated for a particular record.
This can only be done if you compare each field between source and destination based on a join on a legacy field like Id and compare each field.
Learn SSIS thanks for the reply. Just to clarify, i understand that the fields would have to be compared. My question is how do we achieve that? Currently I have a left outer merge join like you have shown in this video. My question is how do we achieve it on a field level? Obviously we compare fields between the source and destination but how do we achieve that?
@@SharjeelButt In the Matched record output, we can use a Derived column transformation and compare one source field for example lastname with destination source field and name the output as lastname_result, the data type will be of boolean (lastname_source == lastname_dest), so if the value will be same it will have a true value returned OR if the value will not be same means there is an update then the lastname_result will contain a False value. Now May be we can insert this new column lastname_Dest to a sql table and can easily identify which records have a false value for matched output.
@@learnssis When I add a Derived Column component after my conditional split to the matched records output, the list of available columns in my Derived Columns is only from the source table. How can I call the list of columns from my destination?
How can you handle the same issue if the primary key has identity index and is being referenced by multiple foreign key.. this is the real problem in update and insert to another table in another server
Try to first update\insert to master tables and then update\insert to tables those have foreign keys.
Thank you for this, How could you work if we have multiple table?, Please suggest me if we have any best way. Thank you
can we do this using lookup and conditional split components?? Kindly Answer
Of course we can do the same using Lookup and conditional split and that is the best approach. Here is the video on that approach.
th-cam.com/video/8dRIjVhKoKs/w-d-xo.html
@@learnssis thank you for quick reply.
Hello,
But sort is a blocking transformation, will it be recommended for huge data like billions of rows?
No, we should use Lookup transformation for huge data. I just made the video to show that this is also an option but this should be used for small data set for large data set we should use Lookup transformation. I am planning to create a video on that very soon. Thanks.
I am getting the error when using the merge join "both inputs of the transformation must contain at least one sorted column, and those columns must have matching metadata." Please can you advice
Just before the Merge join transformation use Sort transformation and sort the data for example on Id column for both inputs. Thanks.
@@learnssis Thats what I did, but cant diagnose the error
please arrange these class by class for easy understanding
Hi Ahmed, Very good job !
It will be nice to share your SQL scripts and csv files examples !
Thanks a lot Ahmed.
Thank you brother, I have shared them. If you check the description of video, there is a google drive link from where you can download it.
@@learnssis i am unable to see the google drive link, but without that it's very hard to practice and grasp the topic.
@@shubhamdhawan5845 I have uploaded all required files to a folder for this video. Also I have added the link to google drive.
@@learnssis yes now i can see the link, Thanks
@@shubhamdhawan5845 Great.
SSIS package could not open global shared memory to communicate with performance DLL. am using windows 10
Please take a look at this URL, I don't have much experience with this issue.
stackoverflow.com/questions/11227397/ssis-package-could-not-open-global-shared-memory-to-communicate-with-performance
thank you so mutch Aqil Very good job on BigData file but you have any idea, how I can make this package generic and reusable with any csv file with generic mapping respecting the same conditions of this case thks for all Demo
It won't be easy to do make this process dynamic to process any dynamic csv file.
@@learnssisplease can i generate file xml configuration in these program for update parameter connexion and name the table and namecolomun and type ?
@@lamiaaelammari5172 I have once seen a stored procedure in a company that they were using for incremental data update from source table to destination table.
So first using the code from C#, you can import any csv file into a sql table table and then using a sql server stored procedure you can easily insert\update data to the destination table from your source table.
You can take help from chatgpt to generate the code for you.
For example below is the code to insert\update data to a single table
MERGE INTO DestinationTable AS dest
USING SourceTable AS src
ON dest.ID = src.ID -- Assuming ID is the matching key
-- When matched (record exists in destination)
WHEN MATCHED THEN
UPDATE
SET dest.Name = src.Name,
dest.LastUpdated = GETDATE()
-- When not matched (record doesn't exist in destination)
WHEN NOT MATCHED THEN
INSERT (ID, Name, LastUpdated)
VALUES (src.ID, src.Name, GETDATE());
----------------------------
And now below is the dynamic code to insert\update to any table. This code I got from chatgpt
DECLARE @sourceTableName NVARCHAR(128) = 'SourceTable';
DECLARE @destinationTableName NVARCHAR(128) = 'DestinationTable';
DECLARE @mergeStatement NVARCHAR(MAX);
DECLARE @columns NVARCHAR(MAX);
-- Generate a comma-separated list of columns
SELECT @columns = STRING_AGG(QUOTENAME(name), ', ') WITHIN GROUP (ORDER BY column_id)
FROM sys.columns
WHERE object_id = OBJECT_ID(@sourceTableName);
-- Build the dynamic MERGE statement
SET @mergeStatement = '
MERGE INTO ' + QUOTENAME(@destinationTableName) + ' AS dest
USING ' + QUOTENAME(@sourceTableName) + ' AS src
ON dest.ID = src.ID -- Assuming ID is the matching key
-- When matched (record exists in destination)
WHEN MATCHED THEN
UPDATE
SET ' + @columns + ' = src.' + @columns + ',
dest.LastUpdated = GETDATE()
-- When not matched (record doesn''t exist in destination)
WHEN NOT MATCHED THEN
INSERT (' + @columns + ', LastUpdated)
VALUES (src.' + @columns + ', GETDATE());';
-- Execute the dynamic MERGE statement
EXEC sp_executesql @mergeStatement;
@@learnssis Thank you very much Aqil for your answer and your suggestion, I find it relevant to me, I will follow it
I just want to put the steps to follow and you confirm me, it’s okay or not
first, I do a task script on SSIS, I put the same program of you to recover any file I follow the video of "78 How to import multiple csv files into sql server | Import all CSV files from folder to sql server"
After I create a stored procedure that merges and updates the table this procedure store
question: I have to call it through the sql execution task and parameter value passages, I pass them where exactly
this is when I run a package or I make a config.xml file.
I prefer to create an xml file how I can use the values of xml files to run the package , if you see that the logic is not correct please correct me if you have videos in this direction or links share with me
your proposals are considerable to me
@@lamiaaelammari51721. In first step you will import the csv file using the same code that I used in C# to import the data into sql server table.
2. In second step you will import the data from xml file into a sql staging table.
3. In third step you will call the stored procedure by passing the sql table name, and maybe joining column etc. Because you have xml file stored in the database thus you can easily get the parameter values from staging table and can pass to the dynamic stored procedure so that stored procedure knows what is the source and destination tables to be updated.
How can I rollback if for some reason the inserted or updated fails?
Did you tried transactions in SSIS ?
th-cam.com/video/YdbIfkfJqLQ/w-d-xo.html&lc=UgyQrKO_bRZWafat_eF4AaABAg
Good job!
Glad to know that you liked the video.
Can this be done from one database to another??
Yes, it can be update. Even this can be done from one server to another server. And the best solution for differential update is using lookup
th-cam.com/video/8dRIjVhKoKs/w-d-xo.html
Hello Aqil. Such a clear explanation , but I have a doubt what if we have multiple source files and we need to update and insert few records in a table in SQL server? Also if Target table data is 50M what method you suggest?
Thanks for your question Hyma.
I would suggest you to use the method below which I used here using Lookup transformation
th-cam.com/video/8dRIjVhKoKs/w-d-xo.html
If you have multiple source files, then you can load them to an staging table into sql server and then you can use the lookup transformation to check which records are new from source, which are modified records to be updated and which records do not need to be updated.
@@learnssis Do I need to use for each loop to load multiple files into staging table?
@@hymapranusha6090 If the structure of the files is same then yes you can use foreach loop container to loop through multiple files and load them to staging table.
@@learnssis Thank you so much Aqil :)
@@hymapranusha6090 No problem.
Yes, very good video.
Thanks for your comment.
sir what is the use of on [primary]?
On Primary means, the table will be created on the Primary file group in the database. In a SQL Server database there can be multiple filegroups, like Primary and secondary.
If your database is created on a sql server disk that is filled and if your database wants to expand then you can create a secondary file group as secondary and create a ndf file and assign the ndf file to secondary file group. Now when the data wil be populated to the tables inside database then the database size will increase to secondary filegroup and will have data in ndf file.
Video is very useful and thanks for it... One quick question. Why do you want to keep id column as varchar instead of int... In which case this will work... Response is much appreciated... Thanks...
I think I kept the id as varchar incase data is coming from a source where id is not integer. You can keep id as int. Thanks.
nicely explained
Good to know that you liked the video.Thanks.
good job Ahmed,
Have you tired this technique with really big tables ? i'm talking 100 milions of rows?
And if you know, is it faster than using lookup ?
Thank you in advance and keep the good work on
No, I have not. Lookup method will be faster than this. And there is one more way Hash Method that should be faster than Lookup, I will make a video on that one as well.
@@learnssis Yes , and Lookup methode will be surely faster of forexemple took the last 30 days of data , put it in temporary table and then do the lookup on the temporary table and push the non existant data into the final tables. this methode very useful, i used it in table of 1 Billion rows and it worked just fine
@@ExonaQ Correct.
very good aqil. some packages it is showing disable option ,how can i download the package now
We are working to get that fixed within a week.
Awesome thanks
Good to know that you liked the video.
Thank you
You are most welcome.
Why drop and create table and not just trúncate table if exist? If doesn’t exist then create table
Yeah we can use truncate as well if table exists, both options can be used.
thank you very very match
You are welcome. Good to know that you liked the video.
This tutorial is perfect for me, but I have to redo the scavenging for each table and for each database, I have several tables, I have to redo the same thing , there is no solution to make this package generic and reusable.
I have replied to your other comments, there is no easy way to make the process dynamic.
Thanks sir
Can u place derived column code in notepad
Okay, I have uploaded it.
Thanks...
Glad to you that you liked the video.
Nice video..Good job Aqil. could you please provide me videos how to do incremental load on excel if possible.
Hi, thanks for your comment. The logic for the incremental load will be same whatever I have shown for text file. So If you put the excel file instead of CSV file as a source then logic will still remain the same, only the components will change from flat file source to excel source. Thanks.
@@learnssis Hi the need to download the package ...please help me with tat
@@PowerWithInYou555 You need to make a free account on www.dataplatformlabs.com and then you can buy the course its completely free. Then you should be able to download it. Thanks.
Thanks
Welcome.
it's good
Thank you Ravi for your comment.