Thank you Adam! I had been trying to follow some other written content to do exactly what you showed with no success. Your precise steps and explanation of the process were so helpful. I am successful now.
I really really like how you guide step by step like this, it is quite easy to understand. You are the best “trainner” I’ve seen, really appreciated for your time on creating those useful videos.
Hey, one thing about English - please guys correct me if I am wrong, but I am pretty sure what I am talking about - you shouldn't say inside a sentence "how does it work", but "what it works". Despite that, the content is awesome!
You can if you ask a question. "How does it work" is a question structure, not a statement. it should be "how it works" if I'm stating a fact. You wrote "What it works" but I assume that's a typo. It's one of my common mistakes, my English teacher tries to fix it but it is still a common issue for me ;) Thanks for watching!
Thank you! It's under appreciated how important it is to name things something other than "demoDataset", but it makes a big difference both for understanding concepts, and maintainability.
Hi Adam, very nice work this, I made this for a client of mine and found out one important thing: within the For each not all blocks are executed as if they are working together atomically. What I mean is that if you start two thinks in parallel using the For Each block and within the for each block you have two blocks - say block A and B - connected using parameters (item()) within these blocks say X and Y. Block A starting with item X will not necessarily be using the item X in block B although connected! So I want to suggest one extra advice to use only one block in a For Each block at max if using parameterazed block within or if you need more than one block start a separate pipeline within the For Each block which will have the multiple blocks. These pipelines will be started as separate childs and to the work in the correct order. With kind regards, Jeroen
@@AdamMarczakYT I'm using a for each loop to load tables with dynamic statements, if I need more than one block (like a logging call to sl server, a copy block to load the data and a logging block after being done with loading these blocks can be in the for each loop itself, but if you start in parallel multiple times different load of tables the blocks will not follow each other sequencially, but will be running through each other, so the logging will not belong to the copy block for example. I will see if I can make an example if I find the time. To solve this I start always another pipeline within the for each and put the blocks in this pipeline. This will create child pipelines in the for each loop ensuring the right order of execution of the blocks (logging start, copy and logging end)
Hey Adam. Great video! Two questions regarding the pipeline itself. 1. How do we approach Source Version Control of the pipeline? In SSIS we could export a package and commit to Git or use TFS. How do we approach versioning in Azure? 2. What is the approach to deploy this pipeline in upper environment? Assuming that this pipeline was created in dev, how do I approach deployment in i.e. UAT?
I think this page describes and answers both of your questions. docs.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment?WT.mc_id=AZ-MVP-5003556 thanks for watching :)
Very interesting video Adam. I found quite enlightening your idea of storing metadata. probably it could be maintained separately tracking last record loaded so we could use it as an input for delta loads through queries instead of reloading the full table on each run.
You can either use watermark or change tracking patterns check this out docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview?WT.mc_id=AZ-MVP-5003556
thanks a lot for the videos. Is really grateful to see all the dedication and attention to detail from each video; explanation, supporting slides, code and demo really covers the material well.
Very good explanation! I will try to read a list of tables but not export but do mask certain columns. I guess i have to use a derved column inside the for each loop maybe. Three parameters: schema_name, table_name and column_name. But how to make: update . set = sh2() where Key in (select Key from othertable) in a derived column Name context?
I reseached and tried one thing. Seems to work. Have the lookup and foreach like this video and inside the loop i put the function. There I make an dynamic sql statement picking up the variables from the table i have in lookup.
Hi Adam, this is awesome session. One question: Where in Azure documentation I can find the information regarding all possible output variable of a given activity e.g. when you were explaining about lookup activity, you talked about the property "firstrow". Where can I find such properties supported by all activities in Azure documentation ?
Thanks! For details always check the docs by googling "ADF action". For lookup this would pop up which explains everything you asked docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
Hey Adam, awesome work and explanation! Do you have another video explaining how to deal with massive data copies from tables in bulk using ADF and that may resolve issues with maximum data or rows of data? Can you make a video with a demo explaining how to deal with this kind of scenarios that you mentioned that's the story for another day? Thanks a lot in advance!! =D
Thanks. Well, Lookup shouldn't be used for data, but for metadata driven approach, so 5000 rows limit is very good here. It is rare when you will copy over 5000 tables/files with different structure)/etc. If you do you can do different techniques but in those cases I probably would shift approach entirely. Will think about this.
Thanks for your great video. Is there any limitation in terms of a number of tables, or table size when we use copy multiple in bulk according to your experience?
Thank you for the content, is there a way to identify which copy activity run for which table, e.g through pipeline output perhaps, I dont see that info
Very nice Adam !!! I didn't checked yet but do you have the process reading from blob storage the csv files and transfered the data to the table ? If you have it can you pls provide the link ?
Hey, thank you. I didn't make video on reversed scenario as I thought it might be too similar to this one. Just use Get Metadata instead of lookup to get the list of files and upload them to a table. Note that for this to work files or paths should contain the name of the table so you know which file goes to which table. :)
@@AdamMarczakYT I was able to create the pipeline how read the files from the blob and transfer the data to a sql database , I would like to trigger that pipeline when a file arrive on blob , the event grid is activated i created the trigger but didn't fire the pipeline , any guidance ? Thank you !
Super cool video sir. You are my priority to search about azure after stackoverflow LOL. Sir, how about the cost ? Is it better using loop or paralel. Coz using loop it makes you have extra running time.
Thank you for your great videos, they have been super helpful. I'm working on a proof of concept similar to this video, however it's SQL to Azure SQL. Any links or references you can offer to help me with parameterizing the Azure SQL sink side?
Thanks Adam. I have one query like while we create .csv files , how can we add trailer i.e., footer in the end of the file which will have "count of rows". i.e.., I wanted end statement as TRAILER|20 if row count is 20.
Based on the video you should be able to do it by modifying the query to include rowcount column for each table. Then use this column to construct file name using concat function.
@@AdamMarczakYT Thanks for your quick response Adam. But i dont want rowcount in filename. i want it in the footer of the file i.e., inside the file. Can you please let me know if there any way to do the same.? Thankyou in advance!!
Best Video watched today, Crystal clear !!. Thanks Adam. I am trying to copy multiple Azure Table storages. But struggling to get the query to list table names for azure table storage in the lookup activity. Any solutions please? Thanks in Advance!!
Excellent. Well, in this case replace lookup with Web Activity and query table storage rest api docs.microsoft.com/en-us/rest/api/storageservices/query-tables?WT.mc_id=AZ-MVP-5003556 and loop over the response.
Great video but where is the story for another time about large lookup scenarios that require multiple pipelines? I'm in dire need of that solution! :)
At that point you simply would need an Azure architect to design the best solution for you. There is no single solution that works best in all cases. In larger solution you need experienced architect to understand all requirements and design the best solution available.
@@AdamMarczakYT Thank you for your reply but I think it would be of great interest to all of us if you could present a simple scenario where a large json file with more than 5000 ids (being used in an API call that accepts 500 ids at once) is being processed by this nested pipelines approach. There is no such example as of now anywhere....
Some connectors support pulling structure from the source docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity?WT.mc_id=AZ-MVP-5003556#supported-connectors but it would be pretty tricky to transform this to dataset schema. But not that dataset schema is automatically pulled from database if you leave empty schema.
Not yet but check this tutorial from MS docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview?WT.mc_id=AZ-MVP-5003556 thanks for stopping by :)
Awesome video 👍 my requirement is the other way around, how can I copy from ADLS to Azure SQL, i have multiple files in multiple folders in ADLS storage account?
First of all thank u so much for very useful video.can you please create some video where we can connect ADF output file to powerBI for reporting and how we can auto refresh the same.
Thanks for the video, that's a great one. I have a question like, This pipeline will run on all the tables or some tables according to our logic, but what if we add new tables to the data source and if we run this pipeline again it will run over all the tables again rt? how can we run only on newly added tables, not on the ones that already exist on target? Hope you can help me. Thank you
Hi. I don't plan to make video on that because it's too similar to this video. You just need to use Get Metadata instead of Lookup over the blob and then reserve sink with the source. But it's too similar approach to make separate video on that. thanks for watching!
Hi Adam. It was a helpful session. I have a question. How can I identify the newly added rows in my Azure blob storage files? I am building a data pipeline from Azure Blob storage to Azure Postgres database. There will be only one file at source blob storage but the customer will upload the same file daily with new rows. Thanks in Advance.
Check this docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview?WT.mc_id=AZ-MVP-5003556 and this docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-partitioned-file-name-copy-data-tool?WT.mc_id=AZ-MVP-5003556 which describes how to do incremental uploads. Other than this you can also load everything into staging table and do SQL merge or stored procedure to merge your data.
Cannot thank you enough for your incredibly well laid out, thorough explanations. The world needs more folks like you :)
You are the example for how teaching should be. Just awesome 👍
Wow, thank you!
You have depth knowledge in every service. I learn from scratch using your channel. Keep posting Thanks you and God bless you.
Awesome, thanks!
Thank you Adam! I had been trying to follow some other written content to do exactly what you showed with no success. Your precise steps and explanation of the process were so helpful. I am successful now.
I really really like how you guide step by step like this, it is quite easy to understand. You are the best “trainner” I’ve seen, really appreciated for your time on creating those useful videos.
Your videos are great. This is the best channel on TH-cam platform to learn about ADF. THANKS 🙏😊
Thank you, Adam. I rewatch this video at least twice a year, Thank you for all you do.
Super helpful . We are breaking our head on how to migrate 32,000 Oracle tables into ADL. this was so simple and helpful
What a wonderful content you have place in social media.. What a world class personality you... People certainly fall in love with your teaching..
I am a beginner in Azure Data Engineering and you made it simple to learn all the tactics.. thanks
Glad to hear that!
Very professionally demonstrated and very clear to understand. Thank you very much
It's my pleasure Paul! :)
You're the best Adam! Thanks for all the help, been watching your tutorials on ADF and they're very helpful. Keep them coming!
My pleasure! Thanks Alberto!
Videos are very much clear to the people who would like to learn and practice.Thanks alot.your hard work is appreciated.
Hey, one thing about English - please guys correct me if I am wrong, but I am pretty sure what I am talking about - you shouldn't say inside a sentence "how does it work", but "what it works". Despite that, the content is awesome!
You can if you ask a question. "How does it work" is a question structure, not a statement. it should be "how it works" if I'm stating a fact. You wrote "What it works" but I assume that's a typo. It's one of my common mistakes, my English teacher tries to fix it but it is still a common issue for me ;) Thanks for watching!
It is extremely hard to find information online about this topic. Thank you for making it easy!
Glad it was helpful! Thanks!
Thank you! It's under appreciated how important it is to name things something other than "demoDataset", but it makes a big difference both for understanding concepts, and maintainability.
Glad it was helpful! You are of course correct, if it's not demo then take care of your naming conventions.
The way you explain is super Adam. Really nice
Hi Adam
Ur videos are just too brilliant. This is subscription I wouldn’t mind paying to support. Ur lessons are invaluable to learning.
Awesome, thank you!
Adam you are just awesome man! The way you are teaching is excellent. Keep it up.. you are the best...
Thanks! 😃
Thank you so much for the clear and nice explanation, I am new to ADF and learning a lot from your channel
Great to hear!
This video was really helpful! you have leveled up my Azure skills, Thank you sir, you have gained another subscriber
Great and simple walk through, good job Adam
Thank you, I appreciate it! :)
Hello Adam,
thanks for this demo, Your Channel is A bless for new learner
Brilliant teaching style Adam. Very watchable. I particularly like how you explain the background. I've subscribed and will watch more of your videos.
Excellent, Excellent video. This has truly cemented the concepts and processes you are explaining in my brain. You are awesome, Adam!
THANK YOU SO MUCH for this! The step-by-step really helped with what I needed to do.
You are a very good teacher.
Thank you! 😃
very simple yet powerful explanation
Glad you think so!
Amazing video. Complex topic perfectly explained. Thank you Adam
Incredibly simplified to learn. .. Great!!
Great video, easy to follow and to the point, really helped me to quickly get up a running with data factory.
Glad it helped!
You are a legend. Next level editing and explanation
Fantastic clear-cut explanation. Nice job!
Glad it was helpful!
it was so perfect , I was able to follow and copy data in first attempt .thanks
Hi Adam, very nice work this, I made this for a client of mine and found out one important thing: within the For each not all blocks are executed as if they are working together atomically. What I mean is that if you start two thinks in parallel using the For Each block and within the for each block you have two blocks - say block A and B - connected using parameters (item()) within these blocks say X and Y. Block A starting with item X will not necessarily be using the item X in block B although connected!
So I want to suggest one extra advice to use only one block in a For Each block at max if using parameterazed block within or if you need more than one block start a separate pipeline within the For Each block which will have the multiple blocks. These pipelines will be started as separate childs and to the work in the correct order.
With kind regards,
Jeroen
Hey, not sure I understood what you meant here. Using parameters is not making any connection between the actions.
@@AdamMarczakYT I'm using a for each loop to load tables with dynamic statements, if I need more than one block (like a logging call to sl server, a copy block to load the data and a logging block after being done with loading these blocks can be in the for each loop itself, but if you start in parallel multiple times different load of tables the blocks will not follow each other sequencially, but will be running through each other, so the logging will not belong to the copy block for example. I will see if I can make an example if I find the time. To solve this I start always another pipeline within the for each and put the blocks in this pipeline. This will create child pipelines in the for each loop ensuring the right order of execution of the blocks (logging start, copy and logging end)
Thanks Adam, your tutorials are very useful, hope to see more in the future
Glad you like them! Will do more!
your skills are in the tops thanks, love to see your channel grow
I appreciate that!
Awesome explanation, the way you teach assuming in layman terms is pretty great, thanks!!
Awsome adam there cant be a way to explain better than this
Thank you so much for sharing these valued knowledge. It's very helpful for me.
Glad it was helpful!
Hey Adam. Great video! Two questions regarding the pipeline itself. 1. How do we approach Source Version Control of the pipeline? In SSIS we could export a package and commit to Git or use TFS. How do we approach versioning in Azure? 2. What is the approach to deploy this pipeline in upper environment? Assuming that this pipeline was created in dev, how do I approach deployment in i.e. UAT?
I think this page describes and answers both of your questions. docs.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment?WT.mc_id=AZ-MVP-5003556 thanks for watching :)
Thanks Adam, amazing workshop, very clear and easy to follow, thanks for helping, i am wiser now :)
Perfect! Thank you!
I was looking for this video. Thanks for making this. It helps a lot. Thanks again.
Wow! What Great video, very easy way step by step tutorials and explanations. Well done!
Very interesting video Adam. I found quite enlightening your idea of storing metadata. probably it could be maintained separately tracking last record loaded so we could use it as an input for delta loads through queries instead of reloading the full table on each run.
You can either use watermark or change tracking patterns check this out docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview?WT.mc_id=AZ-MVP-5003556
Thanks Adam, I'm waiting like this video on ADF, Please do regularly...
You got it!
Your videos are awesome man. Gave me a firm grasp and encouraged me to get an azure subscription and play around some more.
That is amazing to hear! Thank you!
Adam, You are the best!. Thanks man!
Thank you :)
Excellent video and knowledge sharing. Great Job!
Glad you enjoyed it!
Thank you Adam!! These videos are really very helpful and builds the foundation to understand ADF.
My pleasure!
thanks a lot for the videos. Is really grateful to see all the dedication and attention to detail from each video; explanation, supporting slides, code and demo really covers the material well.
Glad you like them! Thanks!
Adam, thanks for all your great video's! I appreciate your work very much! Keep up your great work!
My pleasure! Thanks!
You are really best Adam! Your tutorial helped me a lot. Thanks
Happy to hear that!
@@AdamMarczakYT You are welcome. Please keep up the good work.
Very good explanation! I will try to read a list of tables but not export but do mask certain columns. I guess i have to use a derved column inside the for each loop maybe. Three parameters: schema_name, table_name and column_name. But how to make: update . set = sh2() where Key in (select Key from othertable) in a derived column Name context?
I reseached and tried one thing. Seems to work. Have the lookup and foreach like this video and inside the loop i put the function. There I make an dynamic sql statement picking up the variables from the table i have in lookup.
Very well explained. Thank you!
Glad it was helpful!
Thank you! I really appreciate all you share, it truly helps me
Great content, easy to follow!!
Glad you think so!
Wow this was explained so well. Thank you!!!
Brilliant tutorial. Easy to follow and it all works like a charm. Thank you!!
Very nice video with good explanation.
Glad you liked it
Amazingly simple and informative!
Hi Adam, this is awesome session. One question: Where in Azure documentation I can find the information regarding all possible output variable of a given activity e.g. when you were explaining about lookup activity, you talked about the property "firstrow". Where can I find such properties supported by all activities in Azure documentation ?
Thanks! For details always check the docs by googling "ADF action". For lookup this would pop up which explains everything you asked docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
Thanks for your awesome video, it helped me out a great deal
Glad I could help
Very well explained & succinct. One request - if possible create a video for loading ADW (Synapse) data-warehouse by ADF
Thanks! I'm waiting for synapse new workspace experience to be released to make video about it ;)
fantastic video Adam!! Really helpful to understand the parametrisation in ADF.
Great to hear that!
These tutorials are so useful!
Glad you like them!
Awesome. Thank you so much Adam!
My pleasure!
Thank You so much.... Very good explanation, Just Awesome
Hey Adam, awesome work and explanation! Do you have another video explaining how to deal with massive data copies from tables in bulk using ADF and that may resolve issues with maximum data or rows of data? Can you make a video with a demo explaining how to deal with this kind of scenarios that you mentioned that's the story for another day? Thanks a lot in advance!! =D
Thanks. Well, Lookup shouldn't be used for data, but for metadata driven approach, so 5000 rows limit is very good here. It is rare when you will copy over 5000 tables/files with different structure)/etc. If you do you can do different techniques but in those cases I probably would shift approach entirely. Will think about this.
Thanks for your great video. Is there any limitation in terms of a number of tables, or table size when we use copy multiple in bulk according to your experience?
Great session!! Thanks Adam.
My pleasure!
Thank you for the content, is there a way to identify which copy activity run for which table, e.g through pipeline output perhaps,
I dont see that info
Hi Adam, is it possible to load the data into the ADLSGen2 as a delta file instead of CSV?
👍👍👍 very good explanation.. 👍👍.
You are awesome Adam. Thank you so much for detailed explanation.
My pleasure!
Very nice Adam !!! I didn't checked yet but do you have the process reading from blob storage the csv files and transfered the data to the table ? If you have it can you pls provide the link ?
Hey, thank you. I didn't make video on reversed scenario as I thought it might be too similar to this one. Just use Get Metadata instead of lookup to get the list of files and upload them to a table. Note that for this to work files or paths should contain the name of the table so you know which file goes to which table. :)
@@AdamMarczakYT I was able to create the pipeline how read the files from the blob and transfer the data to a sql database , I would like to trigger that pipeline when a file arrive on blob , the event grid is activated i created the trigger but didn't fire the pipeline , any guidance ? Thank you !
Super cool video sir. You are my priority to search about azure after stackoverflow LOL.
Sir, how about the cost ? Is it better using loop or paralel. Coz using loop it makes you have extra running time.
Realy great stuff sir.this what am looking in youtube
Thanks a ton!
Thank you for your great videos, they have been super helpful. I'm working on a proof of concept similar to this video, however it's SQL to Azure SQL. Any links or references you can offer to help me with parameterizing the Azure SQL sink side?
You are using vnet ir or self hosted? Copy activity is costly with vnet ir
Hi Adam, I really appreciate you video. Thanks for your videos! I hope you can also create a video for ODBC as data source.
Thank you so much for this. It helped a lot
You are very good 👍 explained well thanks 😊
Thanks Adam. I have one query like while we create .csv files , how can we add trailer i.e., footer in the end of the file which will have "count of rows". i.e.., I wanted end statement as TRAILER|20 if row count is 20.
Based on the video you should be able to do it by modifying the query to include rowcount column for each table. Then use this column to construct file name using concat function.
@@AdamMarczakYT Thanks for your quick response Adam. But i dont want rowcount in filename. i want it in the footer of the file i.e., inside the file. Can you please let me know if there any way to do the same.? Thankyou in advance!!
Please do a video on Parameterized pipeline in detail with different parameters
Did you had a chance to check my video on ADF parametrization? th-cam.com/video/pISBgwrdxPM/w-d-xo.html
Fine tutorial. Thanks.
Glad it was helpful!
Best Video watched today, Crystal clear !!. Thanks Adam.
I am trying to copy multiple Azure Table storages. But struggling to get the query to list table names for azure table storage in the lookup activity. Any solutions please? Thanks in Advance!!
Excellent. Well, in this case replace lookup with Web Activity and query table storage rest api docs.microsoft.com/en-us/rest/api/storageservices/query-tables?WT.mc_id=AZ-MVP-5003556 and loop over the response.
Hi Adam, thanks for the video! I am wondering if it is possible specify just one row from the table by id and copy it? Thanks in advance!
Great video but where is the story for another time about large lookup scenarios that require multiple pipelines? I'm in dire need of that solution! :)
At that point you simply would need an Azure architect to design the best solution for you. There is no single solution that works best in all cases. In larger solution you need experienced architect to understand all requirements and design the best solution available.
@@AdamMarczakYT Thank you for your reply but I think it would be of great interest to all of us if you could present a simple scenario where a large json file with more than 5000 ids (being used in an API call that accepts 500 ids at once) is being processed by this nested pipelines approach. There is no such example as of now anywhere....
Instead of lookup can we we use metadata activity to import the schema..??
Some connectors support pulling structure from the source docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity?WT.mc_id=AZ-MVP-5003556#supported-connectors but it would be pretty tricky to transform this to dataset schema. But not that dataset schema is automatically pulled from database if you leave empty schema.
dzieki! bardzo pomocne filmy!
Dziekoweczka!
great teacher
Thank you! 😃
Very nice video, I appreciate that. do you have follow-up video for incremental copy?
Not yet but check this tutorial from MS docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview?WT.mc_id=AZ-MVP-5003556 thanks for stopping by :)
Awesome video 👍 my requirement is the other way around, how can I copy from ADLS to Azure SQL, i have multiple files in multiple folders in ADLS storage account?
First of all thank u so much for very useful video.can you please create some video where we can connect ADF output file to powerBI for reporting and how we can auto refresh the same.
Thanks for the video, that's a great one.
I have a question like, This pipeline will run on all the tables or some tables according to our logic, but what if we add new tables to the data source and if we run this pipeline again it will run over all the tables again rt? how can we run only on newly added tables, not on the ones that already exist on target?
Hope you can help me.
Thank you
Great Explanation !!!!
Had you made similar dynamic type video for copying data from gen2 to sql?If yes please share link.
Thanks
Hi. I don't plan to make video on that because it's too similar to this video. You just need to use Get Metadata instead of Lookup over the blob and then reserve sink with the source. But it's too similar approach to make separate video on that. thanks for watching!
Good one adam sir .
Thank you :)
Hi Adam. It was a helpful session. I have a question. How can I identify the newly added rows in my Azure blob storage files? I am building a data pipeline from Azure Blob storage to Azure Postgres database. There will be only one file at source blob storage but the customer will upload the same file daily with new rows. Thanks in Advance.
Check this docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview?WT.mc_id=AZ-MVP-5003556 and this docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-partitioned-file-name-copy-data-tool?WT.mc_id=AZ-MVP-5003556 which describes how to do incremental uploads. Other than this you can also load everything into staging table and do SQL merge or stored procedure to merge your data.