How to detect and Mask PII data in Apache Hudi Data Lake | Hands on Lab

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ต.ค. 2024
  • Code :
    github.com/sou...
    Read More
    Detect and process sensitive data
    docs.aws.amazo...
    EntityDetectorConfiguration
    docs.aws.amazo...
    Why its important to mask PII data in datalake ?
    Masking PII (Personally Identifiable Information) data in a data lake is important because it helps to protect the privacy and security of individuals by preventing unauthorized access to sensitive information. This can include information such as names, addresses, Social Security numbers, and other identifying information that could be used for identity theft or other malicious purposes. Additionally, masking PII data can also help organizations comply with data privacy regulations such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA).
    #aws #cloud #cloudcomputing #azure #devops #technology #python #amazonwebservices #linux #amazon #programming #awscloud #cybersecurity #coding #googlecloud #developer #kubernetes #bigdata #datascience #microsoft #machinelearning #software #java #tech #it #gcp #awstraining #javascript #security #dockerna

ความคิดเห็น • 4

  • @ritikasharma7817
    @ritikasharma7817 7 หลายเดือนก่อน

    Hey Soumil, Thankyou for the detailed video.
    I just have one question what if you have a extra column lets say CONTEXT and under this column you have a sentence which have person’s name.
    For eg - “Ritika is under depression “
    How am i gonna remove this name under a column?
    Please reply. Please

    • @SoumilShah
      @SoumilShah  7 หลายเดือนก่อน

      It should work for that too

  • @arunr2265
    @arunr2265 ปีที่แล้ว

    Thanks Soumil for the nice video. How do we mask/unmask based on user roles. Also do we need to maintain 2 copies of data masked one and unmasked one based on the use case.

    • @SoumilShah
      @SoumilShah  ปีที่แล้ว

      Hello
      I am not sure why you would need two copies
      Ideally PII data you don’t unmask.
      For analytics I don’t think credit card and sensitive information is needed
      Would you elaborate please