Generating and classifying bootstrap replicates with test driven development (CC283)

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ก.ค. 2024
  • Now that we have a kmer database, we are ready to classify our sequences using the Naive Bayesian approach. We'll start by refactoring some code that we used to make the database and then we'll generate and classify our bootstrap replicates. Of course, everything will be done using test driven development (TDD). This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.
    If you want to get a physical copy of R Packages: amzn.to/43pMR8L
    If you want a free, online version of R packages: r-pkgs.org/
    You can find my blog post for this episode at www.riffomonas.org/code_club/....
    Check out the GitHub repository at the:
    * Beginning of the episode: github.com/riffomonas/phyloty...
    * End of the episode: github.com/riffomonas/phyloty...
    #rstats #refactor #testthat #tdd #microbenchmark #vectors #rdp #16S #classification #classifier #microbialecology #microbiome
    Support Riffomonas by becoming a Patreon member!
    / riffomonas
    Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at shop.riffomonas.org/youtube to get practice problems, tips, and insights.
    If you're interested in purchasing a video workshop be sure to check out riffomonas.org/workshops/
    You can also find complete tutorials for learning R with the tidyverse using...
    Microbial ecology data: www.riffomonas.org/minimalR/
    General data: www.riffomonas.org/generalR/
    0:00 Introduction
    4:32 Adding packages to DESCRIPTION file
    8:20 Designing our classification function
    10:55 Refactor generation of kmers from sequences
    20:33 Generate bootstrap replicates of kmers
    26:54 Classify kmers
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 6

  • @djangoworldwide7925
    @djangoworldwide7925 หลายเดือนก่อน

    The amount of work is amazing. And then to document all of that.. btw, will you include error handling? I always feel its a big thing, weather you want to round to integer or positive number if the user puts size = -8 or 8.5 for example, or just fail the function. Its a subtle difference between being too explicit and treat the user like a baby.
    I guess it depends on WHO is the expected user (programmer? Scientist? Student?).
    Would love to see your ideas of error handling and how you will perfom it in a concise manner across all your user-exported functions !
    Y

    • @Riffomonas
      @Riffomonas  หลายเดือนก่อน +1

      Oooh. Good point! We'll definitely have to come back to error checking before it's all said and done. From our experience developing mothur we've come to appreciate how users can use a tool in ways we never could have imagined :)

  • @csalt3689
    @csalt3689 หลายเดือนก่อน

    Anyone in microbiology ecology should watch this series. Now I understand what's going on behind the scenes when I am classifying my sequences with a Naive Bayes classifier. Question: on line 155 in the video you use apply() but I think your Github code on line 156 has: "probabilities

    • @Riffomonas
      @Riffomonas  หลายเดือนก่อน +1

      Wonderful! You jumped ahead in the stream of commits/episodes :) I'll eventually change from multiplying the probabilities to adding the log of the probabilities to prevent generating numbers that are smaller than the computer can handle. The repository at the end of this episode can be found in the show notes. github.com/riffomonas/phylotypr/tree/7a5cac20692406321bf982b2046f947daf45a84f

    • @csalt3689
      @csalt3689 หลายเดือนก่อน +1

      @@Riffomonas Cool! That's a good idea. I'm curious to see how those two approaches compare. BTW, I've been coding along with you but using Python (hope you don't mind 🙂). This is such a super inspiring project.

    • @Riffomonas
      @Riffomonas  หลายเดือนก่อน +1

      @@csalt3689 that's awesome! I'd love to see how your python code compares to my R code (and our earlier C++ code)