Storage Event Trigger and How To Automate Your ADF Pipeline Run

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ต.ค. 2024

ความคิดเห็น • 10

  • @梁文隆-m9p
    @梁文隆-m9p 6 หลายเดือนก่อน +1

    Is it possible to add a date parameters in blob path so that pipelines can be triggered by source system's daily data push?

    • @thecloudbox
      @thecloudbox  6 หลายเดือนก่อน

      You can parameterize and pass the date accordingly

    • @梁文隆-m9p
      @梁文隆-m9p 6 หลายเดือนก่อน

      @@thecloudbox Never expectd to get a prompt reply, thanks and subscribed!
      It seems not work, nothing matched if I add a variable or reglar expression in the blob path.Let's say the blob path is `bronze/dbms/hadoop_hive/partition_day=20240410`, and if you set it like this `bronze/dbms/hadoop_hive/partition_day=@getPastTime(1,'Day','yyyyMMdd')`, then no matched path return and the pipeline can not be triggered neither.
      Besides that, if hadoop dist_cp have more than one map job, which means the destination files in blob storage will have more then one files, then the pipeline will be triggered many times. Apparently it can be solved by setting the map number to 1, but it not elegant anyway.
      If I use data fabric to do that, note that the data in Hadoop also needs a batch jobs and once the data prepared then push it(by dist_cp) to blob storage, and I'm prefer to invoke the fabric pipelines once the data is readay, but the event trigger not supportd in Fabric version ADF. In lakehouse way, I can create a fabric shortcut to "pull" data, but since we built a hybrid on-perm&cloud archtecture, pull is not the best way. we expect the hadoop job and fabric pipelines could be running seprately. Are there any solutions?

    • @梁文隆-m9p
      @梁文隆-m9p 6 หลายเดือนก่อน

      @@thecloudbox Never expectd to get a prompt reply, thanks and subscribed!
      It seems not work, nothing matched if I add a variable or reglar expression in the blob path.Let's say the blob path is `bronze/dbms/hadoop_hive/partition_day=20240410`, and if you set it like this `bronze/dbms/hadoop_hive/partition_day=@getPastTime(1,'Day','yyyyMMdd')`, then no matched path return and the pipeline can not be triggered neither.
      Besides that, if hadoop dist_cp have more than one map job, which means the destination files in blob storage will have more then one files, then the pipeline will be triggered many times. Apparently it can be solved by setting the map number to 1, but it not elegant anyway.
      If I use data fabric to do that, note that the data in Hadoop also needs a batch jobs and once the data prepared then push it(by dist_cp) to blob storage, and I'm prefer to invoke the fabric pipelines once the data is readay, but the event trigger not supportd in Fabric version ADF. In lakehouse way, I can create a fabric shortcut to "pull" data, but since we built a hybrid on-perm&cloud archtecture, pull is not the best way. we expect the hadoop job and fabric pipelines could be running seprately. Are there any solutions?

    • @梁文隆-m9p
      @梁文隆-m9p 6 หลายเดือนก่อน

      @@thecloudbox Never expectd to get a prompt reply, thanks and subscribed!
      It seems not work, nothing matched if I add a variable or reglar expression in the blob path.Let's say the blob path is `bronze/dbms/hadoop_hive/partition_day=20240410`, and if you set it like this `bronze/dbms/hadoop_hive/partition_day=@getPastTime(1,'Day','yyyyMMdd')`, then no matched path return and the pipeline can not be triggered neither.

  • @murshidameen5009
    @murshidameen5009 2 หลายเดือนก่อน

    I want process pdf to images,Can i install npm modules here?

  • @vaibhavgupta9856
    @vaibhavgupta9856 10 หลายเดือนก่อน

    What if I want to overwrite the files in adls? Will it then trigger the pipeline by storage event trigger

    • @thecloudbox
      @thecloudbox  10 หลายเดือนก่อน

      It depends on the file type and other things which you have used in pipeline , if everything matches your pipeline runs then trigger will also run,

    • @shubhamsingh-j5q
      @shubhamsingh-j5q 9 หลายเดือนก่อน

      I have many folders inside the container, in this case how can I trigger the file for each folder dynamically

    • @thecloudbox
      @thecloudbox  9 หลายเดือนก่อน

      You need to create multiple storage event trigger if your files are in different folders and you want to consider all different folders