Data Science for Computational Drug Discovery using Python (Part 1)

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 ก.ย. 2024
  • In this video, I will show you step-by-step in this End-to-end Bioinformatics / Cheminformatics tutorial on how to use Data Science in a Computational Drug Discovery project as we reproduce the research work of Delaney by predicting the solubility of molecules in Python using scikit-learn, rdkit and pandas libraries.
    ✅Check out Part 2 of this video: • Data Science for Compu...
    🌟 Buy me a coffee: www.buymeacoff...
    ⭕ Links for this video:
    ✅Code: github.com/dat...
    ✅Delaney's ORIGINAL ARTICLE entitled "ESOL:  Estimating Aqueous Solubility Directly from Molecular Structure" pubs.acs.org/d...
    ✅Read my EDITORIAL ARTICLE entitled "Maximizing computational tools for successful drug discovery" www.tandfonlin...
    ⭕ Playlist:
    Check out our other videos in the following playlists.
    ✅ Data Science 101: bit.ly/datapro...
    ✅ Data Science TH-camr Podcast: bit.ly/datasci...
    ✅ Data Science Virtual Internship: bit.ly/datapro...
    ✅ Bioinformatics: bit.ly/dataprof...
    ✅ Data Science Toolbox: bit.ly/datapro...
    ✅ Streamlit (Web App in Python): bit.ly/datapro...
    ✅ Shiny (Web App in R): bit.ly/datapro...
    ✅ Google Colab Tips and Tricks: bit.ly/datapro...
    ✅ Pandas Tips and Tricks: bit.ly/datapro...
    ✅ Python Data Science Project: bit.ly/datapro...
    ✅ R Data Science Project: bit.ly/datapro...
    ⭕ Subscribe:
    If you're new here, it would mean the world to me if you would consider subscribing to this channel.
    ✅ Subscribe: www.youtube.co...
    ⭕ Recommended Tools:
    Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it!
    ✅ Check out Kite: www.kite.com/g...
    ⭕ Recommended Books:
    ✅ Hands-On Machine Learning with Scikit-Learn : amzn.to/3hTKuTt
    ✅ Data Science from Scratch : amzn.to/3fO0JiZ
    ✅ Python Data Science Handbook : amzn.to/37Tvf8n
    ✅ R for Data Science : amzn.to/2YCPcgW
    ✅ Artificial Intelligence: The Insights You Need from Harvard Business Review: amzn.to/33jTdcv
    ✅ AI Superpowers: China, Silicon Valley, and the New World Order: amzn.to/3nghGrd
    ⭕ Stock photos, graphics and videos used on this channel:
    ✅ 1.envato.marke...
    ⭕ Follow us:
    ✅ Medium: bit.ly/chanin-m...
    ✅ FaceBook: / dataprofessor
    ✅ Website: dataprofessor.org/ (Under construction)
    ✅ Twitter: / thedataprof
    ✅ Instagram: / data.professor
    ✅ LinkedIn: / chanin-nantasenamat
    ✅ GitHub 1: github.com/dat...
    ✅ GitHub 2: github.com/cha...
    ⭕ Disclaimer:
    Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.
    #dataprofessor #bioinformatics #drugdiscovery #drugdesign #drug #drugs #molecule #molecules #machinelearning #lecture #dataprofessor #bigdata #QSAR #QSPR #machinelearning #datascienceproject #randomforest #decisiontree #svm #neuralnet #neuralnetwork #supportvectormachine #python #learnpython #pythonprogramming #datascience #datamining #bigdata #datascienceworkshop #dataminingworkshop #dataminingtutorial #datasciencetutorial #ai #artificialintelligence #tutorial #dataanalytics #dataanalysis #machinelearningmodel

ความคิดเห็น • 87

  • @DataProfessor
    @DataProfessor  4 ปีที่แล้ว +39

    Did you find value in this End-to-end tutorial in Bioinformatics/Cheminformatics? If you would like more videos like this please give it a 👍Like and ❤️Subscribe to the channel. Please comment down below your thoughts and suggestions 👇

  • @FrancoCiminoPrado
    @FrancoCiminoPrado 4 ปีที่แล้ว +15

    I'm just starting with python, I'm an organic chemist looking to change my field from wet lab to comp chem, this is gold for me, thank you very much.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +5

      Thanks Franco for the comment. I'm planning on making more of these data science for drug discovery videos.

    • @FrancoCiminoPrado
      @FrancoCiminoPrado 3 ปีที่แล้ว +1

      @@Kelrash31 Hi Alain, it's been slow but consistent. I've been working on QSAR and Docking at the moment. Still haven get that much into scripting for data managing but it's in the future plans.

    • @rahimakhatun4935
      @rahimakhatun4935 2 ปีที่แล้ว

      @@DataProfessor Hi I am enthusiastic to learn QSAR and MD simulation for Protein Degraders. Do you recommend particular blog or any Book to learn data science for drug design/optimization. Your lectures/explanation are fantastic, getting enlightened all the pros/cros about CADD. Thank you very much

  • @michaeloladunjoye5258
    @michaeloladunjoye5258 4 ปีที่แล้ว +3

    I'm presently working on drug discovery with deep neural networks and I found this tutorial very helpful.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      That's awesome Michael! Speaking of drug discovery, I have a several more videos covering the topic here, you are more than welcome to check them out bit.ly/dataprofessor-bioinformatics

  • @epicakku5381
    @epicakku5381 3 ปีที่แล้ว +1

    i m a student in india i have so much interest in bioinformatics and i found u thank u so much.... I m currently studying to get in a college for bioinfo undergrad

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +1

      It's my pleasure, welcome to the channel and welcome to bioinformatics 😊

  • @sametgumus1281
    @sametgumus1281 4 ปีที่แล้ว +2

    thank you professor please share more about drug discovery

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Thanks Samet, my pleasure, please stay tuned by hitting the notification bell 😃

  • @traveldiaries347
    @traveldiaries347 3 ปีที่แล้ว +3

    That's great, kindly make a whole series for drug discovery pipeline using ML/DL methods, Thanks you

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +2

      Hi, thanks for the comment. This channel got you covered, make sure to go through this Bioinformatics playlist of 17 videos (more to come) that includes theory and practice (step-by-step) to get you started in doing bioinformatics projects. bit.ly/dataprofessor-bioinformatics

    • @traveldiaries347
      @traveldiaries347 3 ปีที่แล้ว +2

      @@DataProfessor thank you so much Professor

  • @shwetaredkar734
    @shwetaredkar734 4 ปีที่แล้ว +2

    Just loving the content you make.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Thanks Shweta for the kind comment!

  • @rasianaik9084
    @rasianaik9084 3 ปีที่แล้ว +2

    awsome series.......could you please make a video on extracting important information of drugs like chemical structure, target proteins, sideeffects from different databases...thankyou

  • @keerthikonjety6257
    @keerthikonjety6257 3 ปีที่แล้ว +1

    Precise and clear.Thank you so much!

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      Thanks for watching and for the kind words 😁

  • @sandeepurandur7930
    @sandeepurandur7930 3 ปีที่แล้ว +2

    Hey, I'm a pharmacy graduate and I'm new to data science, we're working on solubility prediction, your video seems to be useful to estimate the aq.sol of compounds. We have few new compounds whose solubility needs to be estimated. If you can tell me how to use the above method for new compounds it will be more helpful to us.

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว +3

      Hi, I've made a video on how to build a solubility prediction web app, you can check it out here th-cam.com/video/iZUH1qlgnys/w-d-xo.html
      A demo of this web app is also provided in the video description.

  • @gustavoespinoza7940
    @gustavoespinoza7940 ปีที่แล้ว

    You can use pandas apply function to simplify a lot of the computation involving the ratio between aromaticity and heavy atoms.
    if you define a function
    def foo(row):
    ## compute aromaticity by heavy atom for a single row
    ## row contains the fields for a given row in your pd dataframe
    then do
    df["aromatic_to_heavy"] = df.apply(foo)
    I think with pandas its best to use their in-built functions for iterations to save computational power

  • @marcofestu
    @marcofestu 4 ปีที่แล้ว +3

    I was waiting for this one, thank u 😁

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Thanks Marco for the comment! Glad to hear that!

    • @parobe6167
      @parobe6167 3 ปีที่แล้ว

      @@DataProfessor Great Video! What books you recommend me? I am going to start my PhD in Drug Discovery. Moreover, if you have githubs or colabs, all is perfecto for me. Thanks!

  • @ikechukwumichael1383
    @ikechukwumichael1383 2 หลายเดือนก่อน +1

    Thank you

  • @michalisgeorgiou2886
    @michalisgeorgiou2886 4 ปีที่แล้ว +4

    Thank you for your videos they are amazing!! Is it possible to provide us with some tips theoretical or practical knowledge on the data science for bioinformatics? e.g which are the most used data-preprocessing steps, feature selection steps and models, validation modes and on which bioinformatic problems can we use them?

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Thanks Michalis for the suggestion! I’ll put this excellent idea into the to-do list for future videos.

  • @josejonnyrodriguezfajardo4135
    @josejonnyrodriguezfajardo4135 3 ปีที่แล้ว +1

    I already subscribe to your chanel. First time I found something like this after very long time searching for this kind of videos. I'm very pleased with the information on this video. Congratulations 👏🎊🎉 dear Data professor.I want to become data scientist on the field of drugs discovery and design, can you advise me where to start and which book should I reed first from the list below. Thank you.

  • @louisl7245
    @louisl7245 3 ปีที่แล้ว +1

    Thanks. It is very great learning process via your video

  • @sebastianjorgecastro2452
    @sebastianjorgecastro2452 4 ปีที่แล้ว +1

    I would like to suggest a video using RDKit for conformational search and energy minimization. I'm just starting my bioinformatic project and this video was really helpfull! Thanks!

  • @zapy422
    @zapy422 4 ปีที่แล้ว +2

    Very useful.
    Where to find good data for training?

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +3

      Thanks for watching! There are larger datasets available on chemical databases such as ChEMBL, PubChem, BindingDB, etc. which can be used as external datasets to the dataset used in this video.

  • @vrschwrngsthrtkr22
    @vrschwrngsthrtkr22 4 ปีที่แล้ว +4

    How can this be used to do DIY, at home drug discovery? What I mean by this is, does this only have academic value or can you apply the results to something you can easily get without an university degree or comparable credentials? As you might know, a small company started selling kits that allow you to genetically modify bacteria and frogs with crispr. I am looking for something along the lines.

    • @vrschwrngsthrtkr22
      @vrschwrngsthrtkr22 4 ปีที่แล้ว

      So?

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +2

      The computational model can definitely be built by anyone if following the step-by-step tutorials. As for bringing the discovered knowledge to the next step, you may need to collaborate with many people (chemists, other biologists, FDA officials/ clinical trials, etc.) Bringing a drug to market is a billion-dollar endeavor that involves many people/organizations.

    • @vrschwrngsthrtkr22
      @vrschwrngsthrtkr22 4 ปีที่แล้ว +1

      @@DataProfessor Not everyone lives in the united states. That being said, I conclude that this only has academic value.
      Stop and think for a moment about exceptions. Which experiments can you conduct that come as close as possible to real drug design without the need for paid chemists, other biologists, clinical trials?

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      @@vrschwrngsthrtkr22 Thanks for the discussion. I agree, that it takes a lot of resources for bringing a drug to market. Actually, much of the budget for carrying drug discovery and development are from big pharmaceutical companies while academia accounts for a minor portion.

    • @vrschwrngsthrtkr22
      @vrschwrngsthrtkr22 4 ปีที่แล้ว +2

      @@DataProfessor Which experiments can you conduct that come as close as possible to real drug design without the need for paid chemists, other biologists, clinical trials?

  • @ernestbonat2440
    @ernestbonat2440 3 ปีที่แล้ว +1

    Excellent videos by the Data Professor. Feel free to read the following blog paper on Medium website “Apply Machine Learning
    Algorithms for Genomics Data Classification”. This will help you to understand how to apply Machine Learning algorithms for
    genomic data classification. This blog paper contains the latest ML/AI technologies applied to human genomic data classification today.

  • @liaanggraini8667
    @liaanggraini8667 4 ปีที่แล้ว +2

    Hi prof, thank you for posting and sharing knowledge. My background is computer science and I am interested this topic about data science or AI driven in drug discovery since last year especially about drug interaction. However along the way, I found some difficulties regarding to understand this biological data, process and terms. Could you give me some tips to thrive in this field? I really want this field a
    to be my primary research topic in master degree

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Hi Lia, thanks for sharing your interest in computational drug discovery. I've written some review articles on the topic that may provide some introductory viewpoints to the field.
      www.tandfonline.com/doi/full/10.1517/17460441.2015.1016497
      www.researchgate.net/publication/338639486_Best_Practices_for_Constructing_Reproducible_QSAR_Models
      A more complete list is at www.researchgate.net/profile/Chanin_Nantasenamat/research

    • @liaanggraini8667
      @liaanggraini8667 4 ปีที่แล้ว

      @@DataProfessor hi professor, thank you so much for this. I am sorry for the late reply. I am starting to follow your video learning so I can understand both coding and biology data at the same time. Hopefully, we can collaborate in academic research in the future :). Keep spreading the knowledge, you are a great tutor

  • @stefanrucman5352
    @stefanrucman5352 4 ปีที่แล้ว +1

    Amazing 👌🙌👌 insightful

  • @negarmokhtari3411
    @negarmokhtari3411 3 ปีที่แล้ว +1

    Can you help me with finding how to counts the number of atom in compound with rdkit?I wanna use'non-carbon proportion' feature in my model!

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      Yes, you can use the .GetNumAtoms() function on the molecule object. More details provided here www.rdkit.org/docs/GettingStartedInPython.html

  • @xjeffrey344
    @xjeffrey344 2 ปีที่แล้ว +1

    Thanks, professor. It is a really good tutorial. Can I use this method to predict the solubility of one chemical in a liquid solution (or lipid solubility)? If not , is there any suggestions or tools I can use for lipid solubility prediction? Thank you very much.

  • @RojinaPanta1
    @RojinaPanta1 25 วันที่ผ่านมา

    are these descriptors any better than molecular fingerprint ?

  • @afolabiowoloye804
    @afolabiowoloye804 10 หลายเดือนก่อน

    @Data Professor, many thanks

  • @waleedrashad822
    @waleedrashad822 2 ปีที่แล้ว +1

    Perfect

  • @miroslavanedyalkova5174
    @miroslavanedyalkova5174 4 ปีที่แล้ว +1

    Could you share the notebook? Very nice tutorial.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Thanks Miroslava for the kind comment. The link to the notebook code for all videos in this channel is in the video descriptions of all videos. For this video, the link is github.com/dataprofessor/code/blob/master/python/cheminformatics_predicting_solubility.ipynb

  • @pcliang2693
    @pcliang2693 4 ปีที่แล้ว +1

    love love ,nice course。

  • @datascienceespanol869
    @datascienceespanol869 4 ปีที่แล้ว +2

    Great exercise! I am also a Data Scientist but my videos are in spanish in case there's anyone interested!😁😁

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Muy bien! Awesome channel Ana!

  • @satvikkg3059
    @satvikkg3059 4 ปีที่แล้ว +2

    Can you please make a simple tutorial for gromacs on colab.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Satvik, coincidentally, it is in the making, I have already drafted a notebook but will film the video soon, please stay tuned. Please turn on the notifications so that you will be notified as soon as a new video comes out. Thanks for your suggestion!

  • @ropon-palaciosg.7760
    @ropon-palaciosg.7760 4 ปีที่แล้ว +1

    i'm try predicted drung FDA approved using pharmacophore modelling, please as can i use DNN method for this approach.

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      I think you can, DNN is used to build the model, you’ll have to decide which input are you using, e.g. SMILES, chemical structure image, descriptors, fingerprints, etc.

  • @Chimie-Universitaire
    @Chimie-Universitaire 2 ปีที่แล้ว

    i am a doctor in organic chemistry and macromolecular , I want to predicted the solubility of polymers using delanay predicted , p^lease can you give me the idea or the step that I should do it in the first
    I want to make experience and compared with this
    can you help me

  • @kashafnaz_
    @kashafnaz_ 3 ปีที่แล้ว

    Awesome

  • @aayushividhoy5943
    @aayushividhoy5943 ปีที่แล้ว

    I have completed my biomedical engineering...and currents working in clinical SAS will I be able to Switc job in this domain ?

  • @aruchan9890
    @aruchan9890 4 ปีที่แล้ว +1

    Could you please tell me how to get rdkit in python2 colab notebook?

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว

      Thanks for the comment, owing to compatability issues, the code provided in the header part of the code on the provided DataProfessor GitHub works optimally. Please kindly refer to the provided code link in the video description.

  • @SuperShiva619
    @SuperShiva619 4 ปีที่แล้ว +1

    Will there be usage of other ensemble algorithms like adaboost and GB ?

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Do you mean for this dataset? yes, you can also use that here. This tutorial reproduces the research published by Dalaney and so also used linear regression to match the approach that they used.

    • @SuperShiva619
      @SuperShiva619 4 ปีที่แล้ว

      @@DataProfessor thank u professor for the response.
      Could you also give some thoughts on how this model helps in future in drug development process ?

  • @rasianaik9084
    @rasianaik9084 3 ปีที่แล้ว

    Hello sir, how to calculate drug pairwise similarity based on chemical structure fingerprint corresponding to 881 chemical structures defined in PubChem database?

    • @DataProfessor
      @DataProfessor  3 ปีที่แล้ว

      Hi the pairwise molecular similarity can be computed using the Tanimoto coefficient, I think rdkit allows to do that.

    • @rasianaik9084
      @rasianaik9084 3 ปีที่แล้ว

      @@DataProfessor thanks a lot ...is there any video of yours on that as i am new to this field .i have to start from scratch...any suggestions will be highly appreciated

  • @bikashpradhan5954
    @bikashpradhan5954 4 ปีที่แล้ว +1

    Does PhD is necessary for becoming a data scientist in the field like biotechnology or bioinformatics?

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      The answer really depends on the type of work that you would like to do. PhD is not necessary to become a data scientist working in the field of biotechnology/ bioinformatics. A PhD is necessary if you want to become a principal investigator and lead a research group (probably applicable more to academia, and maybe industry)

  • @JohnDoe-oo9ll
    @JohnDoe-oo9ll 3 ปีที่แล้ว

    It's totally not my place to say, and I know the professor must have thought a lot about his "lisp", but I believe he can maser the "s" sound if he focuses on how his teeth touch the tongue when pronouncing an open "eeeeeeeeeee" sound and slowly raising the tip of his tongue; AT SOME POINT STOP VIBRATING your throat (while making the "eee" sound) and just allow air to pass the tube created by the tongue and slowly raise the tip of your tongue to the roof of your mouth (it doesn't need to TOUCH the roof) without blocking the entire passageway for the air. ALSO for him specifically he might try to pull back the tongue (keeping contact with the same portion of the roof of the moutn) and use a more forward portion of the tip of the tongue. Air should NOT leave anywhere but from the front of the tongue

  • @MrChristian331
    @MrChristian331 4 ปีที่แล้ว +2

    I'm totally lost with the aromatic atoms part

    • @DataProfessor
      @DataProfessor  4 ปีที่แล้ว +1

      Hi Kris, I have written a complementary article on Medium explaining this at link.medium.com/8OB3NXKwo9