Load data from Azure Blob Storage into Python

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ส.ค. 2024
  • Code below:
    from datetime import datetime, timedelta
    from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions
    import pandas as pd
    #enter credentials
    account_name = 'ACCOUNT NAME'
    account_key = 'ACCOUNT KEY'
    container_name = 'CONTAINER NAME'
    #create a client to interact with blob storage
    connect_str = 'DefaultEndpointsProtocol=https;AccountName=' + account_name + ';AccountKey=' + account_key + ';EndpointSuffix=core.windows.net'
    blob_service_client = BlobServiceClient.from_connection_string(connect_str)
    #use the client to connect to the container
    container_client = blob_service_client.get_container_client(container_name)
    #get a list of all blob files in the container
    blob_list = []
    for blob_i in container_client.list_blobs():
    blob_list.append(blob_i.name)
    df_list = []
    #generate a shared access signiture for files and load them into Python
    for blob_i in blob_list:
    #generate a shared access signature for each blob file
    sas_i = generate_blob_sas(account_name = account_name,
    container_name = container_name,
    blob_name = blob_i,
    account_key=account_key,
    permission=BlobSasPermissions(read=True),
    expiry=datetime.utcnow() + timedelta(hours=1))
    sas_url = '' + account_name+'.blob.core.windows.net/' + container_name + '/' + blob_i + '?' + sas_i
    df = pd.read_csv(sas_url)
    df_list.append(df)
    df_combined = pd.concat(df_list, ignore_index=True)

ความคิดเห็น • 41

  • @EwaneGigga
    @EwaneGigga 15 วันที่ผ่านมา +2

    Thanks a lot for this very clear video. I spent hours trying to do this until I luckily stumble across your video. I agree that this video should definitely have more views!!

    • @dotpi5907
      @dotpi5907  14 วันที่ผ่านมา

      I'm glad it helped. Thanks for the support!

  • @CapitanFeeder
    @CapitanFeeder 6 หลายเดือนก่อน +1

    Videos like yours should have way more views. Thank you for what you do.

    • @dotpi5907
      @dotpi5907  6 หลายเดือนก่อน

      Thanks so much! I really appreciate the support

  • @k2line706
    @k2line706 3 หลายเดือนก่อน

    Fantastic video. Very clear explanation and clean code for us to follow. Thank you!

  • @RoamingAdhocrat
    @RoamingAdhocrat 4 วันที่ผ่านมา

    ooh. I support a SaaS app and _hate_ Azure Storage Explorer with a burning passion. if I can access logs etc from Python instead of ASE that would be a very happy rabbithole to go down. I suspect I don't have access to those keys though

  • @dhruvajmeri8677
    @dhruvajmeri8677 3 หลายเดือนก่อน

    Thank you fo this video! It saved my time

  • @charlieevert7666
    @charlieevert7666 ปีที่แล้ว +1

    You da real MVP

    • @dotpi5907
      @dotpi5907  ปีที่แล้ว

      Thanks @charlieevert7666!

  • @ohaya1
    @ohaya1 ปีที่แล้ว +1

    Mega like, thank you so much!

    • @dotpi5907
      @dotpi5907  ปีที่แล้ว

      Glad it helped!

  • @kartikgupta8413
    @kartikgupta8413 7 หลายเดือนก่อน +1

    thank you for this video

    • @dotpi5907
      @dotpi5907  6 หลายเดือนก่อน

      Thanks for watching!

  • @AndresPapaquiNotario
    @AndresPapaquiNotario ปีที่แล้ว +1

    thanks! super helpful 👍

    • @dotpi5907
      @dotpi5907  ปีที่แล้ว

      That's great to hear @AndresPapaquiNotario

  • @investing3370
    @investing3370 7 หลายเดือนก่อน +2

    What with happen when you have SAS token on hand, can it be replaced with account key?

    • @dotpi5907
      @dotpi5907  7 หลายเดือนก่อน

      Hi @investing3370, try changing line 16 to:
      sas_token = 'your sas token'
      connect_str = 'DefaultEndpointsProtocol=https;SharedAccessSignature=' + sas_token + ';EndpointSuffix=core.windows.net'
      you wont line lines 11 to 13
      let me know if that works

  • @thisissuvv
    @thisissuvv ปีที่แล้ว +1

    what if i have multiple directories inside the container and blobfiles are present inside those directories?

  • @kevinduffy2428
    @kevinduffy2428 2 หลายเดือนก่อน

    What if you did not want to bring the files down to the local machine? How would you process the files up on Azure? And run the Python code on Azure. For instance, the files were placed in blob storage and now you wanted process them, clean them up and then save out the results out in blob storage. The Python code is not complicated , just what are the pices/configuration up on Azure.

  • @_the.equalizer_
    @_the.equalizer_ 11 หลายเดือนก่อน +1

    Well Explained ! Actually I want to read ".docx" file from blob. How can I do that?

    • @jsonbourne8122
      @jsonbourne8122 10 หลายเดือนก่อน +1

      You will probably have to read it in bytes and store locally or create a BytesIO object first and then pass to python-docx

  • @learner-df2ns
    @learner-df2ns 11 หลายเดือนก่อน +1

    Hi Sir
    Which version of pandas you have used, can we load into pyspark dataframe instead of pandas data frame if Yes, pls share me the syantax ASAP

    • @benhiggs8834
      @benhiggs8834 11 หลายเดือนก่อน

      Hi @learner-df2ns, Try replacing these lines:
      df = pd.read_csv(sas_url)
      df_list.append(df)
      df_combined = pd.concat(df_list, ignore_index=True)
      with these lines:
      from pyspark.sql import SparkSession
      spark = SparkSession.builder.appName("CSVtoDataFrame").getOrCreate()
      df = spark.read.csv(csv_file_path, header=True, inferSchema=True)
      spark.stop()
      df_list.append(df)
      df_combined = reduce(lambda df1, df2: df1.union(df2), df_list)

  • @alexandrakimberlychavezaqu8290
    @alexandrakimberlychavezaqu8290 ปีที่แล้ว +2

    Thank you so much!!!!

    • @dotpi5907
      @dotpi5907  ปีที่แล้ว

      Glad it helped!

  • @yuthikashekhar1718
    @yuthikashekhar1718 ปีที่แล้ว +3

    Can we do the same for json files stored in blob storage?

    • @sohamjana3802
      @sohamjana3802 7 หลายเดือนก่อน

      I have the same question. Have you found a solution to your question?

  • @nikk6489
    @nikk6489 ปีที่แล้ว +1

    @dotpi5907 can we load the hugging face save model like this? if so can you guide how? or there is any alternate solution? many thanks in advance. Cheers

    • @dotpi5907
      @dotpi5907  ปีที่แล้ว

      Hi Nik K, thanks for the comment! That sounds like great idea for a video. Im away for a few days, but when i get back ill look into it and (all things going well) make a video on it

  • @ericbixby
    @ericbixby 7 หลายเดือนก่อน

    Do you have any suggestions for how to then write a file in a similar fashion to the storage blob?

  • @satyakipradhan2359
    @satyakipradhan2359 ปีที่แล้ว +1

    getting HTTP Error 403: This request is not authorized to perform this operation using this resource type

    • @dotpi5907
      @dotpi5907  ปีที่แล้ว

      Hi @satyakipradhan2359 , thanks for the comment. 403 means that your connection is working but you don't have permission. So, your Azure account might have extra security on it.
      Try changing a few of the other options in the 'Generate SAS' tab. Some of the options like adding your IP address to the 'Allowed IP addresses' and checking that you have read permissions in the 'permissions' dropdown. Hope that helps!

  • @sumitsp01
    @sumitsp01 ปีที่แล้ว +1

    Can we do similar thing to load video file from azure blob storage using libraries like OpenCV .
    I want to load and analyze videos from blob storage inside azure machine learning studio

    • @dotpi5907
      @dotpi5907  ปีที่แล้ว +1

      Hi sumit Sp, thanks for the comment, that sounds like an interesting project and it sounds like it can be done. I'll have a play around and let you know if I figure it out, maybe this weekend.

    • @sumitsp01
      @sumitsp01 ปีที่แล้ว +1

      @@dotpi5907 Thank you for the reply. I tried above thing and I am able to do it now. We can read videos from azure blob storage by providing correct URI path and then we can convert it into frames and store in another location to use in our ml models.
      Now I’m looking for a way to read live videos coming directly from camera 😄

  • @user-bu3zh4rm5j
    @user-bu3zh4rm5j 10 หลายเดือนก่อน

    I have image dataset stored in the azure datastores filestorage. I have a model in azure ml studio. So how do i access the dataset.

  • @remorabay
    @remorabay 5 หลายเดือนก่อน

    I have a python script that reads an EDI file and, from there, creates unique data tags and elements (basically a CSV file with one tag and data field, per line). I need to process to load this into Azure and, for the outbound, to extract into the same tags+data. This looks close. Anyone interested in giving me a quote for this (can you show it working?). Thanks.

  • @nirajmodh5086
    @nirajmodh5086 ปีที่แล้ว +1

    can we set the expiry time to be infinite?

    • @dotpi5907
      @dotpi5907  ปีที่แล้ว +1

      Hi Niraj, thank you for the comment and sorry for the late reply. I don't think you can set the expiry date to be infinite unfortunately. The main reason for this is if someone outside of your organization were to get a hold of your sas key, they would be able to access the file for as long as the sas key is valid or the file is deleted.
      I don't know too much more on this subject, but there is some more info here learn.microsoft.com/en-us/azure/storage/common/sas-expiration-policy?tabs=azure-portal.
      Hope that helps!

  • @surajbhu
    @surajbhu 11 หลายเดือนก่อน

    Is there an way to select files/blob from an Azure container using a flask application for further usage in the application just like request.files.getlist('files') function help in selecting files from the local directory. Can someone help me with this?

    • @oujghoureda1303
      @oujghoureda1303 4 หลายเดือนก่อน

      this worked for me:
      for blob in container_client.list_blobs():
      if blob.name.startswith('resumes/') and blob.name.endswith('.pdf') :
      blob_list.append(blob.name)