Accelerating drug discovery with AI: Insights from Isomorphic Labs

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ย. 2024

ความคิดเห็น • 11

  • @NachuanShan
    @NachuanShan 7 หลายเดือนก่อน +7

    Also 22:08 commenting on Lukas' question. The data in biological world are different from NLP or CV data in various ways, just to name a few:
    1. In biology, the experiment data is only an estimation of the physical ground truth and often inconsistent, whereas in many other domain basically the test corpus used for model training is the same in training and real world. So the intrinsic noise within would impact the ceiling of how a model could be evaluated. Since the data is not ground truth, there is a greater gap between model output and reality, given even if the model is perfect on the testing data.
    2. The lack of data is real. Partially because bio data is expensive. For CV an annotator could label a dozen or even a hundred pictures per hour and it costs less than $100. But in bio world, on average a single row of data could cost $100-$1000, even over $10k or more for things like protein structure, and takes days or weeks generate. It also requires high level expertise to conduct these experiments, and often repeats need to be done to analyze the intrinsic variances of these data.
    3. The format of bio data is so diverse. For LLM, text is all you need, add voice and moving pictures we can train SORA. But in biology, there are hundreds of tasks, structure, affinity, stability, toxicity... each task has many different experiment types.
    Well. If you are interested in more about this my twitter is also NachuanShan. I work at BioMap as a data product manager, building protein language models.

  • @ayoubalkarim9058
    @ayoubalkarim9058 7 หลายเดือนก่อน +4

    As mentioned in a previous comment, a significant challenge in applying machine learning to drug discovery projects lies in the scarcity of robust and well-structured data. For instance, a major factor contributing to the failure of drug discovery endeavours is the suboptimal ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties. The landscape could be transformed if we could develop models capable of predicting the outcomes of in vitro assays, allowing us to streamline the selection of well-optimized candidates for pre-clinical trials. However, the publicly available ADMET data is notably deficient in both quality and quantity, leading to the development of models that lack robustness.

  • @bamh1re318
    @bamh1re318 5 หลายเดือนก่อน +1

    One issue in LLM for drug discovery is Heisenberg Uncertainty Principle. For example scRNA-seq generates big data, great for ML/LLM. But the data is unreal because you need to "disassociate" the cells to do the sequencing.

  • @JagdeepPaniAtEarth
    @JagdeepPaniAtEarth 6 หลายเดือนก่อน

    Excellent questions by Lucas. Insighful discussion.

  • @Yogesh-rg1if
    @Yogesh-rg1if 6 หลายเดือนก่อน +1

    .. becoming comfortable with being uncomfortable ❤️

  • @MindFieldMusic
    @MindFieldMusic 7 หลายเดือนก่อน

    Great conversation. Love this topic.

  • @eliosb989
    @eliosb989 6 หลายเดือนก่อน

    Great questions 👍

  • @dhruvpatel4948
    @dhruvpatel4948 7 หลายเดือนก่อน

    Very insightful and informative

  • @goldnutter412
    @goldnutter412 6 หลายเดือนก่อน

    The brain is not the mind
    The brain is not the mind
    The brain is not the mind
    Demis gonna win #DeepMind #EZ

  • @FLORIDIANMILLIONAIRE
    @FLORIDIANMILLIONAIRE 6 หลายเดือนก่อน

    Where is the data going to come from ? There are strict hippa regulations especially in USA

  • @dontwannabefound
    @dontwannabefound 4 หลายเดือนก่อน

    Interviewer really irritating