How To Handle Data Privacy In ML Projects?-Machine Learning Interview#8

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 พ.ย. 2024

ความคิดเห็น • 81

  • @krishnaik06
    @krishnaik06  3 ปีที่แล้ว +9

    This is an amazing question asked in the interview. Answer them if you know this?

    • @vishalbangari6900
      @vishalbangari6900 3 ปีที่แล้ว +2

      Cape Privacy and Cape Python
      Cape Privacy helps teams share data and make decisions for safer and more powerful data science. Learn more at capeprivacy.com.
      Cape Python brings Cape's policy language to Pandas and Apache Spark. The supported techniques include tokenization with linkability as well as perturbation and rounding. You can experiment with these techniques programmatically, in Python or in human-readable policy files.

    • @maheshsharma7690
      @maheshsharma7690 3 ปีที่แล้ว

      Hey Krish need your help for career decisions I have message you in LinkedIn kindly reply

    • @AnimeFanClub786
      @AnimeFanClub786 3 ปีที่แล้ว

      Sir, we need more questions.
      There is a lot of knowledge we can gain only by experience... So these kind of things are very helpful...

    • @vivekkandeyang6175
      @vivekkandeyang6175 3 ปีที่แล้ว

      features that reveal identity of the user will never be used for ML model rather they just act to uniquely define a sample. So if there is a name, roll no. , phone no etc column one can only keep a single feature column and then replace them by giving unique IDs. And then pass these processed data to 3rd party apps

    • @pdpbasak
      @pdpbasak 3 ปีที่แล้ว

      Let the recipient generate a private-public key pair and share the public key with the sender. The sender would encrypt the data with a password and this password would then be encrypted with the public key. The recipient after receiving the 2 files would first decrypt the password with its private key. Then the retrieved password would be used to decrypt the data. OpenSSL could be used very easily.

  • @rohitme93
    @rohitme93 3 ปีที่แล้ว +13

    Pseudonymisation or Data Masking or Data Encryption (by applying some functions on data) ... Also we can scale or alter the data and remove/replace the column names....so that It won't make sense to the third Party...thus integrity of the data is maintained as well as can be decrypted easier later on by the company itself.

  • @rk1501
    @rk1501 3 ปีที่แล้ว +3

    one approach can be that lets suppose we have a column containing city names, so instead city names we can send a , b , c or any other data in text format , this way the privacy will be preserved and also the dataset wont get affected

  • @soumyaranjansethi1790
    @soumyaranjansethi1790 3 ปีที่แล้ว +2

    Data masking can be done where we can Mask the personal information and alter the column names, change some if the value something different, and at the final stage provide the data in some VM to make it more secure by giving restricted acess.

  • @Balaworld1297
    @Balaworld1297 3 ปีที่แล้ว +2

    Hello Sir,
    Keeping the question and the background of the interview in mind, I came to the following conclusions.
    First, the role here of the data scientist is to identify if the third party vendor would require to do standardisation or normalisation (based on the distribution) of the data and then, give the third party vendor that dataset. 2. Also in some cases they may have to do data encryption. Identifying these techniques is probably a key responsibility of the in-house data scientist.

  • @manassavarshnis6811
    @manassavarshnis6811 3 ปีที่แล้ว +3

    When considering privacy of the data, we must first find out the features that may be the cause for privacy leakage. It might be features like age, country or gender etc. The simplest method will be to remove these features, but then these features might contribute to a better model building and removing them completely is not a really good idea.
    We can maybe use encoding techniques like SMPC or homomorphic techniques. A very simple implementation of these techniques can be done using tf-encrypted which is very much similar to using tensorflow. But this might seem a bit complicated and I am a bit unsure if it will yield better results during model building.
    So another method that we can do is create synthetic data or artificially generated data. But this method has some drawbacks too.
    So the method that we follow to preserve the privacy of the data depends on the use case and how important the features are during model building.

  • @gurjeet333
    @gurjeet333 3 ปีที่แล้ว +1

    Hi Krish, This interview series is very helpful....pls keep up ur good work

  • @rhythmeshwargaming5189
    @rhythmeshwargaming5189 3 ปีที่แล้ว +1

    1. We can apply PCA on the features.
    2. Other transformation can be (WOE) weight on evidence.
    3. Interaction Variable
    And can use more that technique of transformation.

  • @yunes7305
    @yunes7305 3 ปีที่แล้ว +2

    It can be done through Data Masking Method: by adding Noise > Microaggregation > Rank Swapping

  • @preciousbatta9576
    @preciousbatta9576 3 ปีที่แล้ว +2

    This series is so helpful. Thank you

  • @nisarahmadbhat600
    @nisarahmadbhat600 3 ปีที่แล้ว +4

    By assigning an ID in place of identity of the customers

  • @ruchitdodia9501
    @ruchitdodia9501 3 ปีที่แล้ว

    These interview questions help us.
    Thanks for this Playlist.

  • @mouryasashank2213
    @mouryasashank2213 3 ปีที่แล้ว +1

    Interview questions are so helpful

  • @vaibhavsaran7124
    @vaibhavsaran7124 3 ปีที่แล้ว

    Hey Krish the series is awesome for sure, as for question: I would first of all replace the names by some dummy names or values, as for phone number and bank account number I would either encrypt it using some algorithm or perform some mathematical operations to add noise to the data

  • @017farazbintariq9
    @017farazbintariq9 ปีที่แล้ว

    these questions will be definately helpful

  • @shyamgurunath5876
    @shyamgurunath5876 3 ปีที่แล้ว +1

    A combination of Differential Privacy, Homomorphic encryption & federated machine learning may help in Data privacy.
    And Kanyonymization also can be used to anonymize records in the data.

  • @ankurbhattacharjee3912
    @ankurbhattacharjee3912 3 ปีที่แล้ว +2

    Sir last day I came across steganography technique...where important data is hidden Inside ordinary file... probably this technique can be used...

  • @kunalzaveri4191
    @kunalzaveri4191 3 ปีที่แล้ว +1

    yes this type question helps us sir

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 3 ปีที่แล้ว

    We must encrypt the data, then send.
    Every company has a portal to share the data so data need to be shared on the same provided portal.
    We are extremely happy with Interview Playlist.

  • @nandhishnandhu201
    @nandhishnandhu201 3 ปีที่แล้ว

    We can use homomorphic encryption so that the third party can perform computations on the data without decrypting it.

  • @asurya
    @asurya 3 ปีที่แล้ว +3

    Through API we can give the access to third party consultant.

  • @kuldeep27396
    @kuldeep27396 3 ปีที่แล้ว

    For tha data privacy we can use any cloud I can use AWS S3 with some of the IAM permissions and ask the client to give data in cloud and then they can apply ML algorithms in the sagemaker or with using that data.
    2. We can use the virtual machine for the people we are sharing the data, I have worked with banks and they share a lot of data and all are into Citrix systems so that we can work inside of that virtual machine only. We cannot take screenshots or not copy data outside of that virtual machine.

  • @sriramesv
    @sriramesv ปีที่แล้ว

    Encrypting the sensitive information using PCA

  • @miranbaban9554
    @miranbaban9554 3 ปีที่แล้ว

    Dear Krish, I have a question on creating your own machine learning model from scratch. For example creating a model looks like KNN or Decision Tree. Would you please upload a video regarding this.

  • @yashodhanvivek8086
    @yashodhanvivek8086 3 ปีที่แล้ว +1

    On a compliance level GDPR or CCPA can be applied for data privacy

  • @siddivinayak85
    @siddivinayak85 3 ปีที่แล้ว +1

    Provide Principal Components after performing prerequisite data encoding techniques if any.

  • @raph8240
    @raph8240 3 ปีที่แล้ว +1

    You can convert the features into components using PCA

  • @gytry9155
    @gytry9155 3 ปีที่แล้ว

    As the terms of data security we can apply scale down method and send

  • @shilpaprusty3319
    @shilpaprusty3319 3 ปีที่แล้ว

    We can use masking technique for masking some features

  • @niranjanjamkhande3773
    @niranjanjamkhande3773 3 ปีที่แล้ว

    Give them PCA applied data but without changing column names. By doing that, essence of data will be preserved without exposing real data.

  • @IrfanAhmad-od2sn
    @IrfanAhmad-od2sn 3 ปีที่แล้ว

    Same way ,how banking clients share data to kaggle to solve their problem 😉...They don't share actual feature names and values...they probably use defined encryption and decryption technique

  • @abhishekbourai1832
    @abhishekbourai1832 3 ปีที่แล้ว

    Creating synthetic data from given data.

  • @fenilpatel39
    @fenilpatel39 3 ปีที่แล้ว

    Data encryption..by hashing is one way

  • @RaviShankar-cg5wx
    @RaviShankar-cg5wx 3 ปีที่แล้ว +1

    1.Apply PCA to the data
    2.Do Feature Scaling so that data is safe and we can easily transform it back
    3.Apply Data Encryption Techniques
    4. Dynamic Data Masking

  • @divyaharshad9985
    @divyaharshad9985 10 หลายเดือนก่อน

    Will help

  • @prashanthvlogs4801
    @prashanthvlogs4801 3 ปีที่แล้ว +2

    👍👍

  • @0SIGMA
    @0SIGMA 3 ปีที่แล้ว

    Some of the simple techniques I believe is up sampling , or down sampling , or converting in terms of PCA, or just put the data in a pendrive and deliver it directly.

  • @shubhodeepbhowmick
    @shubhodeepbhowmick 3 ปีที่แล้ว +1

    Homomorphic encryption: this is the area of research which deals creating ML models on encrypted data. Client can encrypt the data send it to 3rd party vendor then they can use this technique to create models on this encrypted data and share the results with client. Then with the private key they can decrypt the results. This way the data privacy issue can be handled.

  • @saiprakash5224
    @saiprakash5224 3 ปีที่แล้ว

    Feature scaling like Min max scaler or log transformation techniques!! 🤔

  • @nisargchodavadiya8909
    @nisargchodavadiya8909 3 ปีที่แล้ว +2

    Don't provide orignal features, example is cedit card fraud dataset with features names V1,V2,V3...,V28.

  • @sandeepanmahapatra4888
    @sandeepanmahapatra4888 3 ปีที่แล้ว

    Use Pickle technique?

  • @harshitawasthi6744
    @harshitawasthi6744 3 ปีที่แล้ว +1

    If it’s a text data convert into binary form. If image data convert it into pixel values.

  • @MrHamidmahmud
    @MrHamidmahmud 3 ปีที่แล้ว

    I would encode the data with special company built approach. So afterwards, we could encode it back to original form if it will be required.

  • @CrusadeVoyager
    @CrusadeVoyager 3 ปีที่แล้ว +1

    Data obfuscation, Data masking

  • @shanbhag003
    @shanbhag003 3 ปีที่แล้ว +1

    If it's a CSV file, we can use security key to protect the data from intruders

    • @krishnaik06
      @krishnaik06  3 ปีที่แล้ว +1

      But third party how will they access

    • @shanbhag003
      @shanbhag003 3 ปีที่แล้ว

      @@krishnaik06 if they are the vendors working for our company then we provide them the password.

    • @krishnaik06
      @krishnaik06  3 ปีที่แล้ว +1

      They should not be able to see the real data

    • @niranjanjamkhande3773
      @niranjanjamkhande3773 3 ปีที่แล้ว

      @@krishnaik06 Give them PCA applied data but without changing column names. By doing that, essence of data will be preserved without exposing real data.

    • @vivekkandeyang6175
      @vivekkandeyang6175 3 ปีที่แล้ว

      @@niranjanjamkhande3773 you will be losing information

  • @VishalGupta_5083
    @VishalGupta_5083 3 ปีที่แล้ว

    We can give them pca transformed dataset.

  • @bhagwanpatil5354
    @bhagwanpatil5354 3 ปีที่แล้ว +1

    Convert data into principle components using PCA

  • @SatyamKumar-gd1lz
    @SatyamKumar-gd1lz 3 ปีที่แล้ว

    Till now I can only think of Data Encryptions/

  • @techinfo89
    @techinfo89 2 ปีที่แล้ว

    where is the video?

  • @reenasheoran893
    @reenasheoran893 3 ปีที่แล้ว

    Scale tha data and apply PCA...then encrypt it and send

  • @maheshsharma7690
    @maheshsharma7690 3 ปีที่แล้ว

    We can use S3 iam role for dataview permission and use kms key for file encryption

  • @JainmiahSk
    @JainmiahSk 3 ปีที่แล้ว +1

    👍

  • @saubhagyamarwaha8518
    @saubhagyamarwaha8518 3 ปีที่แล้ว

    Hashing ?

  • @AnimeFanClub786
    @AnimeFanClub786 3 ปีที่แล้ว

    Change the feature names and normalise the data

  • @jibranmohammad7372
    @jibranmohammad7372 3 ปีที่แล้ว

    Differential privacy

  • @ayushiagarwal7342
    @ayushiagarwal7342 2 ปีที่แล้ว

    Scaled Data

  • @anmolwadali9227
    @anmolwadali9227 3 ปีที่แล้ว

    K-anonymity

  • @rahulgyawali
    @rahulgyawali 3 ปีที่แล้ว

    In Encrypted Form via API. Share the encryption key along with client id to the respective team.

  • @anime_on_data7594
    @anime_on_data7594 3 ปีที่แล้ว

    Blockchnaning

  • @maulana6969
    @maulana6969 3 ปีที่แล้ว

    The company may draw all data to a such cloud system, where all the data and ML operation can be performed so fast without downloading option.
    If such platform does not exist then few companies should try to create such clouds with RAM and core access.
    I am from non programming background, so this solution is again as per my knowledge in CSE.

  • @AnimeFanClub786
    @AnimeFanClub786 3 ปีที่แล้ว

    Change the feature names and normalise the data