ไม่สามารถเล่นวิดีโอนี้
ขออภัยในความไม่สะดวก

Synthetic DATA Generation using LANGCHAIN 🦜️🔗

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ต.ค. 2023
  • In this video, I will show you how to create synthetic data using LangChain and OpenAI models.
    Synthetic data refers to artificially generated data that imitates the characteristics of real data without containing any information from actual individuals or entities. It is typically created through mathematical models, algorithms, or other data generation techniques. Synthetic data can be used for a variety of purposes, including testing, research, and training machine learning models, while preserving privacy and security
    Happy Learning 😎
    👉🏼 Links:
    GitHub repo: github.com/sudarshan-koirala/...
    LangChain documentation: python.langchain.com/docs/use...
    ------------------------------------------------------------------------------------------
    ☕ Buy me a Coffee: ko-fi.com/datasciencebasics
    ✌️Patreon: / datasciencebasics
    ------------------------------------------------------------------------------------------
    🔗 🎥 Other videos you might find helpful:
    🔥 Databricks playlist: • 30 Days Of DataBricks
    ⛓️ Langflow: • ⛓️ langflow | UI For 🦜...
    ⛓️ Flowise: • Flowise | UI For 🦜️🔗 L...
    🔥Chainlit playlist: • Chainlit
    🦜️🔗 LangChain playlist: • LangChain
    ------------------------------------------------------------------------------------------
    🤝 Connect with me:
    📺 TH-cam: www.youtube.com/@datascienceb...
    👔 LinkedIn: / sudarshan-koirala
    🐦 Twitter: / mesudarshan
    🔉Medium: / sudarshan-koirala
    💼 Consulting: topmate.io/sudarshan_koirala
    #langchian #llm #synthetic #syntheticdata #datasciencebasics

ความคิดเห็น • 27

  • @aanchalrawat
    @aanchalrawat 5 หลายเดือนก่อน

    Really Amazing

  • @seththunder2077
    @seththunder2077 9 หลายเดือนก่อน +1

    This is amazing! Can you please try making a more comprehensive version of this and use real data as example (doesnt have to be medical but just so that we can see full procedure)

  • @pseudoartist
    @pseudoartist 3 หลายเดือนก่อน

    dami dai dami

  • @henkhbit5748
    @henkhbit5748 9 หลายเดือนก่อน

    interesting video👍 Curious if you have fields that are lookup values and has only 4 different values and after generation the generated values is still valid... Also if you have fields that are made by some algorithm, for example bank number, if its also passed the check constraint for this field after generation based on the few shot examples... And can it also be done using open source llm?

  • @devyanshrastogi
    @devyanshrastogi 9 หลายเดือนก่อน

    I saw your video about fine tuning Llama 2 on your own data, can you please make a similar video on fine tuning zephyr or mistral 7b on google colab using abhisekh thakur's autotrain and then how to use that fine tuned model?

  • @hadikhantec
    @hadikhantec 2 หลายเดือนก่อน

    Thanks! That's a very practical use case. Can you make a full-scale video?

  • @teja3925
    @teja3925 13 วันที่ผ่านมา

    Hello,
    How to generate data when there are two tables and having relationship PK, FK? Does the model is capable enough to generate such data with relation?

  • @harshadahadawale9533
    @harshadahadawale9533 7 หลายเดือนก่อน

    I have made application using same code ....getting output parser error while passing sample data to langchain library

  • @nasiksami2351
    @nasiksami2351 3 หลายเดือนก่อน

    Great tutorial! Is there any open-source implementation available of this approach?

  • @ShubhamKumar-je5dm
    @ShubhamKumar-je5dm 5 หลายเดือนก่อน

    Using AzureChatOpenAI instead of ChatOpenAI, It's not working any idea?

  • @Player13.917
    @Player13.917 2 หลายเดือนก่อน

    I am unable to create 2 tier nested json using this example. Can anyone help here?

  • @prashantt022
    @prashantt022 7 หลายเดือนก่อน

    Good content , very helpful , able to advice ?
    If we check statistical correlation between the real and synthetic data , will the % would be above 90 % ?

    • @datasciencebasics
      @datasciencebasics  7 หลายเดือนก่อน +1

      Personally, haven’t checked it. That would be a good check though before utilizing this in usecases.

  • @sebiraj149
    @sebiraj149 8 หลายเดือนก่อน

    Could you let me know which version of opening and Langchain used in this video

    • @datasciencebasics
      @datasciencebasics  8 หลายเดือนก่อน

      I used the latest version when the video was uploaded so you can check the version from this link searching the package (video uploaded on Oct 27)
      pypi.org/

  • @user-yi8lk1ki9y
    @user-yi8lk1ki9y 6 หลายเดือนก่อน

    Hi, good video, for multi table data generation with referential integrity can we use Langchain ?

    • @ankit85jain
      @ankit85jain 6 หลายเดือนก่อน

      This video is just the explanation of same example which Langchain has given in documentation. I am also looking for examples of more of real world scenario based data generation.

  • @ankit85jain
    @ankit85jain 7 หลายเดือนก่อน

    May I request to suggest what other open source models we can use to generate synthetic data?

    • @datasciencebasics
      @datasciencebasics  7 หลายเดือนก่อน

      I haven’t tried myself with other os models. You can try if it works. Also, one thing to notice is how statistically close the synthetic data and real data are.

  • @orlandocastellanos9263
    @orlandocastellanos9263 9 หลายเดือนก่อน

    What framework is best for enterprise application, haystak or langchain?

    • @datasciencebasics
      @datasciencebasics  9 หลายเดือนก่อน

      Haven’t explored Haystack yet so can’t say which one but having knowledge of both might be beneficial !

    • @orlandocastellanos9263
      @orlandocastellanos9263 9 หลายเดือนก่อน

      @@datasciencebasics thanks for the recommendation but is langchain good enough to work at scale in production?

    • @datasciencebasics
      @datasciencebasics  9 หลายเดือนก่อน

      It depends what kind of app you want to build and deploy it. Underlying models are the key as Langchain is just the framework. Having said that, this field is still evolving and constant upgrades are necessary.

  • @sivaprasadatla
    @sivaprasadatla หลายเดือนก่อน

    Please give the approach for synthetic data generation using Azure open AI as i have azure open AI key

    • @datasciencebasics
      @datasciencebasics  หลายเดือนก่อน

      Hello, you can quickly use Azure OpenAI by importing Azure OpenAI feom LangChain.
      For ref here is the link -> python.langchain.com/v0.2/docs/integrations/llms/azure_openai/

    • @sivaprasadatla
      @sivaprasadatla หลายเดือนก่อน

      @@datasciencebasics thanks a lot! i will check