LLMs: Data Privacy and Protection, PII Anonymisation
ฝัง
- เผยแพร่เมื่อ 16 ก.ย. 2024
- python.langcha...
#datascience #machinelearning #deeplearning #datanalytics #predictiveanalytics
#artificialintelligence #generativeai #largelanguagemodels #naturallanguageprocessing
#computervision #transformers #embedding #graphml #graphdatascience
#datavisualization #businessintelligence #montecarlosimulation #simulation #optimization
#python #aws #azure #gcp
If you found this content useful, pleases consider sharing it with others who might benefit. Your support is greatly appreciated :)
Muchas gracias
Subbed. Any applications for maintaining/enhancing "crowdsourced data quality", "improving transparency and trustworthiness of data anonymization process" using LLM?
AFAIK, Microsoft Presidio is the best one for data anonymization and PII
@@SridharKumarKannam ahan. Looking forward to knowing any other LLM based videos on crowdsourced data quality. Thanks!
is any data base needed , or it stores in buffer menmory of langchain , i was thinking in a application level perspective? where multilple prompts mai raise in same timestamp to the llm , how it de mask to the right prompt?
Its in-buffer memory. What you suggested is useful for a production level application, store the mappings in an external database. I'm not clear about your second question, it should work fine even with multiple prompts at the same time.
Thank you very much. Very nice, crisp and clear presentation. A lots of learning. Can you please share the code ?
python.langchain.com/docs/guides/privacy/presidio_data_anonymization/qa_privacy_protection
Thank You
Very informative videos sir..... just add link or any ref to Notebook
link added in the description
Nice explanation where can we find the code please
python.langchain.com/docs/guides/privacy/presidio_data_anonymization/qa_privacy_protection
@@SridharKumarKannam Thank you
how to handle PII for tabular data or csv or excel
afaik, there isn't any direct way unless you turn your tabular data into a string.
i was wondering how exactly can I do this with ConversationalRetrievalChain cuz I am not using LCEL as its still buggy and a bit confusing
Presidio library is from Microsoft. Langchain simply integrated it with their framework. You can use standalone Predidio - github.com/microsoft/presidio
Any library or language pack that we can use for Indian data?
I've not using anything specific to India. I'll let you know if I come across anything..
@@SridharKumarKannam yes please, if there's anything we can do to train the dataset, would really appreciate a video on the same